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THREE NOVEL GENES ENCODING A ZINC FINGER PROTEIN, A GUANINE, NUCLEOTIDE EXCHANGE FACTOR 
AND A HEAT SHOCK PROTEIN OR HEAT SHOCK BINDING PROTEIN 

FIELD OF THE INVENTION 

5 The present invention relates generally to a novel human gene and its derivatives and to 
mammalian, animal, insect, nematodes, avian and microbial homologues thereof. The present 
invention further provides pharmaceutical compositions and diagnostic agents as well as genetic 
molecules useful in gene replacement therapy and recombinant molecules useful in protein 
replacement therapy. 

10 

BACKGROUND OF THE INVENTION 

Bibliographic details of the publications referred to by author in this specification are collected 
at the end of the description. . 

15 

The increasing sophistication of recombinant DNA technology is greatly facilitating research and 
development in the medical and allied health fields. There is growing need to develop 
recombinant and genetic molecules for use in diagnosis and in conventional pharmaceutical 
preparations as well as in gene and protein replacement therapies. 

20 

In work leading up to the present invention, the inventors sought to identify and clone human 
genes which might be useful as potential diagnostic and/or therapeutic agents. Molecules of 
particular interest targeted by the inventors were gene regulators including regulatory proteins, 
signal transducers and heat shock proteins. 

25 

Gene expression generally requires interaction between a regulatory protein and an appropriate 
recognition sequence of a target gene. Regulatory proteins comprise in many cases a domain or 
motif which facilitates binding to DNA. One particular motif comprises small sequence units 
repeated in tandem with each unit folded about a zinc atom to form separate structural domains. 
30 This motif is now referred to as a zinc finger domain. Such a domain is generally defined by the 
number of cysteine (C) and histidine (H) residues. 
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In addition, knowledge of cellular interaction in the control of cell proliferation is essential in the 
rational design of specific therapeutic strategies aimed at controlling proliferative disorders. 
Such proliferative disorders including a range of cancers, inflammatory conditions and 
atherosclerosis. An important aspect of cellular interaction is in signal transduction via receptors 
5 to intracellular transducers. One key signal transducer is Ras which couples the receptors for 
diverse extracellular signals to different effectors. Ras directly activates the downstream kinase 
Raf which in turn induces the mitogen activated protein kinase (MAPK) cascade. 

Another regulatory mechanism involves heat shock proteins. The Escherichia coli heat shock 
10 protein, DnaJ, is the founding member of a family of proteins which are associated with protein 
folding, protein complex assembly and transit through subcellular components. 

Prokaryotic and eukaryotic DnaJ homologues have a modular organisation consisting of a J 
domain, a glycine-rich spacer, CXXCXGXG [SEQ ID NO: 1] repeats and a C-terminal region 
15 with no obvious sequence features, as well as additional sequences for protein targeting. The 
J domain is anticipated to mediate interaction with heat shock 70 proteins (Hsp70) and consists 
of some 70 amino acids, frequently located at the N-terminus of the protein. 

In accordance with the present invention, a genes have been identified from the human genome 
20 which encodes proteins having a regulatory role. One gene, in accordance with the present 
invention encodes a protein with an N-terminal region resembling a zinc-finger domain of a novel 
type. Another gene encodes a protein involved in guanine nucleotide exchange factor (GEF) 
signalling pathways. Yet another gene encodes a protein which is a heat shock protein or heat 
shock-like protein which may have a role in tumour suppression. 

25 

SUMMARY OF THE INVENTION 

Throughout this specification, unless the context requires otherwise, the word "comprise", or 
variations such as "comprises" or "comprising", will be understood to imply the inclusion of a 
30 stated element or integer or group of elements or integers but not the exclusion of any other 
element or integer or group of elements or integers. 
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Sequence identity numbers (SEQ ED NOs.) for nucleotide and amino acid sequences referred to 
in the subject specification are defined after the bibliography. A summary of SEQ ID NOs. is 
also given in Table 1 . 

5 One aspect of the present invention contemplates an isolated nucleic acid molecule comprising 
a sequence of nucleotides encoding or complementary to a sequence encoding an amino acid 
sequence having homology to a regulator of gene expression or a derivative of said gene 
regulator. 

10 Another aspect of the present invention provides an isolated nucleic acid molecule comprising 
a sequence of nucleotides encoding or complementary to a sequence encoding a regulator of 
gene expression wherein said regulator comprises a zinc finger domain of an (HC 3 ) 2 type. 

Yet another aspect of the present invention is directed to an isolated nucleic acid molecule 
15 comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 



The nucleotide sequence set forth in SEQ ID NO:2 defines the gene, mcg4. This gene encodes 
25 a product, MCG4, having an amino acid sequence set forth in SEQ ID NO:3. 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg4 
gene portion, which mcg4 gene portion is capable of encoding an MCG4 polypeptide or a 
30 functional or immunologically interactive derivative thereof. 



20 



of (i) or (ii); and 

a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 



(iv) 
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Still yet another aspect of the present invention contemplates a method of detecting a condition 
caused or facilitated by an aberration in mcg4, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcg4 wherein the presence of such a nucleotide substitution, deletion 
5 and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcg4, said method comprising screening for a single or 
10 multiple amino acid substitution, deletion and/or addition to MCG4 wherein the presence of such 
a mutation is indicative of or a propensity to develop said condition. 

Another aspect of the present invention contemplates a method for detecting MCG4 or a 
derivative thereof in a biological sample said method comprising contacting said biological 
1 5 sample with an antibody specific for MCG4 or its derivatives or homologues for a time and under 
conditions sufficient for an antibody-MCG4 complex to form, and then detecting said complex. 

A further aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
20 amino acid sequence having homology to a guanine nucleotide exchange factor (GEF) or a 
derivative thereof. 

Yet another aspect of the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

25 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 
or 7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
30 of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions to the 
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nucleotide sequence set forth in (i), (ii) or (iii). 

The nucleotide sequence set forth in SEQ ID NO:4 or 6 defines the gene, mcg7. This gene 
encodes a product, MCG7, having an amino acid sequence set forth in SEQ ID NO:5 or 7. 

5 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg7 
gene portion, which mcg7 gene portion is capable of encoding an MCG7 polypeptide or a 
functional or immunologically interactive derivative thereof. 

Still yet another aspect of the present invention contemplates a method of detecting a condition 
caused or facilitated by an aberration in meg 7, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcg7 wherein the presence of such a nucleotide substitution, deletion 
15 and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcg7 t said method comprising screening for a single or 
20 multiple amino acid substitution, deletion and/or addition to MCG7 wherein the presence of such 
a mutation is indicative of or a propensity to develop said condition. 

Another aspect of the present invention contemplates a method for detecting MCG7 or a 
derivative thereof in a biological sample said method comprising contacting said biological 
25 sample with an antibody specific for MCG7 or its derivatives or homologues for a time and under 
conditions sufficient for an antibody-MCG7 complex to form, and then detecting said complex. 

Yet another aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
30 amino acid sequence having homology to a heat shock protein or a heat shock binding protein 
or a derivative thereof. 
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Another aspect of the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

5 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 41°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

10 

The nucleotide sequence set forth in SEQ ID NO:8 defines the gene, mcgl8. This gene encodes 
a product, MCG18, having an amino acid sequence set forth in SEQ ID NO:7. 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
15 portion and an animal, more particularly a mammalian and even more particularly a human 
mcgl8 gene portion, which meg 18 gene portion is capable of encoding an MCG18 polypeptide 
or a functional or immunologically interactive derivative thereof. 

Still yet another aspect of the present invention contemplates a method of detecting a condition 
20 caused or facilitated by an aberration in mcgl8, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcgl8 wherein the presence of such a nucleotide substitution, deletion 
and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

25 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcgl8 f said method comprising screening for a single 
or multiple amino acid substitution, deletion and/or addition to MCG18 wherein the presence of 
such a mutation is indicative of or a propensity to develop said condition. 

30 

Another aspect of the present invention contemplates a method for detecting MCG18 or a 
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derivative thereof in a biological sample said method comprising contacting said biological 
sample with an antibody specific for MCG18 or its derivatives or homologues for a time and 
under conditions sufficient for an antibody-MCG18 complex to form, and then detecting said 
complex. 

A summary of SEQ ID Nos. referred to in the subject specification is shown in Table 1. 
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TABLE 1 
SUMMARY OF SEQ ID Nos. 



S SEQ ID NO. DESCRIPTION 



1 


amino acid repeat sequence in DnaJ homologues 


2 


Nucleotide sequence of mcg4 


3 


amino acid sequence of MCG4 


4 


nucleotide sequence of mcg7 


5 


amino acid sequence of MCG7 


6 


nucleotide sequence of mcg7 within exon of 




nucleotides 183-288 


7 


amino acid sequence of MCG7 within exon of 




nucleotide 183-288 


8 


nucleotide sequence of meg 18 


9 


amino acid sequence of MCG18 


10-18 


amino acid sequence identified using BESTFTT 


19 


sequence of pGEX and mcg7 junction 


20 


sequence of pGEX and mcg7 junction 


21 


nucleotide sequence of myc-tag/mcg7 junction 


22 


amino acid sequence corresponding to SEQ ID NO:21 


23 


nucleotide sequence of pGEX and meg 7 junction 


24 


amino acid sequence corresponding to SEQ ID NO:23 


25-36 


meg 7-specific oligonucleotide 


37-45 


mcg/8-specific oligonucleotide 



25 Single and three letter abbreviations for amino acid residues are shown in Table 2. 
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TABLE 2 



Amino Acid Three-letter One-letter 



Abbreviation Symbol 

5 ; ; 



Alanine 


Ala 


A 


Arginine 


Arg 


R 


Asparagine 


Asn 


N 


Aspartic acid 


Asp 


D 


10 Cysteine 


Cys 


C 


Glutamine 


Gin 


Q 


Glutamic acid 


Glu 


E 


Glycine 


Gly 


G 


Histidine 


His 


H 


15 Isoleucine 


De 


I 


Leucine 


Leu 


L 


Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


20 Proline 


Pro 


P 


Serine 


Ser 


S 


Threonine 


Thr 


T 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 


25 Valine 


Val 


V 


' Any residue 


Xaa 


X 



30 
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BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 is a representation of the nucleotide sequence [SEQ ID NO:2] and corresponding 
amino acid sequence [SEQ ID NO:3] of mcg4. 

5 

Figure 2 is a representation of the alignment of the human MCG4 amino acid sequence with a 
translation of a partial murine expressed sequence tag (EST). 

Figure 3 is a representation of the alignment of the human MCG4 amino acid sequence with a 
10 translation of a partial nematode EST. 

Figure 4 is a diagrammatic representation showing a predicted structure of MCG4 where H and 
C represent histidine and cysteine residues, respectively and X refers to any amino acid residue. 
Zn represent zinc atoms. 

15 

Figure 5 is a representation of sensitive sequence homology search of related cysteine-containing 
motifs in another Caenorhabditis elegans protein. 

Figure 6 is a representation showing that a related cysteine containing motif is present in the 
20 GATA-binding transcription factor from Saccharomyces pombe. 

Figure 7 is a Northern blot showing expression of mcg4 in various cultured human cancer cell 
lines. Lanes 1-5, respectively, represent the hybridization signal from 15^g total RNA derived 
from various human cancer cell lines. Lanes 1-5, respectively, contain RNA from H69 lung 
25 carcinoma cells, JAM ovary carcinoma cells, BT20 breast carcinoma cells, HaCat transformed 
keratinocytes, T24 bladder carcinoma cells. 

Figure 8 is a representation of a partial alignment of mcg4 with human ESTs AA074703 and 
AA 134788. 

30 

Figure 9 is a representation of the partial nucleotide sequence alignment between a human 
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(W32939) and mouse (AA242159) mcg4 Aikt EST in the putative 5' UTR of the mcg4 cDNA. 
The putative initiation codon is underlined and the region upstream represents 5' UTR. 

Figure 10 is a representation showing Mac Vector alignment of MCG4 with forward translations 
5 ofESTs AA 134788 and AA074703. The nucleotide sequences are shown in Figure 8. 

Figure 11 is a diagrammatic representation of the domains of MCG4 

zinc finger consensus: CX 2 HX 4 CX 2 CX 4 HX 2 CX 17 CX 2 CX 18 HX 2 CX 18 CX 2 C 
acidic domain consensus: 9/34 amino acids negatively charged, 0/34 positively charged 
10 basic domain consensus: 13/55 amino acids positively charged, 0/55 negatively charged 
leucine zipper domain consensus: LX 6 LX 6 RX 6 LX^L 

alternate "novel" leucine zipper-like motif where leucine would not be aligned along the one 
surface of an alpha he!ix domain: (aa261) LX 6 LXLX 6 LXLX^ (aa 286). 

15 Figure 12 is a representation showing similarity of MCG7 with GEFs of various organisms. 

Figure 13(a) is a representation of the nucleotide sequence [SEQ ED NO:4] and corresponding 
amino acid sequence [SEQ ID NO:5] of meg 7. Nucleotides 183-288 are an alternative spliced 
exon (shown in ^er case). 

20 

Figure 13(b) is a representation of the partial nucleotide sequence [SEQ ID NO:6] and 
corresponding amino acid sequence [SEQ ID NO:7] of mcg7 but without the exon shown in Fig. 
13(a). Amino acids have been numbered from the first methionine codon (underlined). The 
cDNA molecules of Fig. 13(a) and Fig. 13(b) differ by the inclusion and exclusion of the exon 
25 of nucleotides 183-288. 

Figure 14 is a representation showing a comparison between MCG7 and a homologue from 
Caenorhabditis elegans using the BESTFTT algorithm, in the figure, the following sequences 
are underlined: 

30 

EF-Hand= PROSITE DATABASE NO. PD0C00018 
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1 a nematode DVDEEDEVEDEF [SEQ ID NO: 10] 
lb human DVDGDGHISQEEF [SEQ ID NO: 1 1] 

nematode DHDRDGFISQEEF [SEQ ID NO: 12] 

lc human DQNQDGCISREEM [SEQ ID NO: 13] 

5 nematode DVDMDGQISKDEL [SEQ ID NO: 14] 

GUANINE NT BINDING REGION = BLOCKS DATABASE NO. BL00720B 

2 human HFVHVAEKLLQLQNFNTLMAWGGLSHSSISRLKETH[SEQIDNO:15] 
nematode KFVHVAKHLRKINNFNTLMSVVGGITHSSVARLAKTY 

10 [SEQ ID NO: 16] 

DaG-PE BINDING DOMAIN = PROSITE DATABASE NO. PD0C00379 

3 human HNFQESNSLRPVACRHCKALILGIYKQGLKCRACGVNCHKQCKDRLSVEC 

[SEQ ID NO: 17] 

15 nematode HNFHETTTLTPTTCNHCNKLLWGILRQGFKCKDCGLAVHSCCKSNAVAEC 
[SEQ ID NO: 18] 

Figure 15 is a representation of an alignment of human and a partial (5 ' UTR and partial coding 
sequence) murine mcgl cDNA (GenBank Acc. No. W71787 and AA237373). The putative 
20 initiation codon is underlined. The murine sequence represents a composite of 2 partial cDNA 
sequences from the EST database (accession numbers W71787 and AA237373). Nucleotide 
differences between human and murine sequences are shown in lower case lettering and identical 
residues are indicated with asterisks. 

25 Figure 16 is a representation of further 5' nucleotide and corresponding amino acid sequence for 
human mcgl. Nucleotide positions 1-321 were derived from GenBank Acc. No. AC000134 and 
nucleotides 322 onwards from Fig. 13(a). Two in-frame initiation codons are underlined. 
Asterisks denote in-frame stop codons. 

30 Figure 17 is a graphical representation of a GDP release assay. □ Experiment #1 (mean of 
duplicates). 0 Experiment #2 (mean of duplicates). The exchange reaction contained 36pmols 
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ofGST-MCG (N-terminally truncated; encoded by Construct B in Fig. 18) and 1.6-12.8 pmols 
of recombinant GST-N-Ras.GDP. Reaction time 6 mins. 
Estimated reaction constants: 
K„, = 2.1fiM f = 37pMol/6min/36pMol [Expt#l] 
5 1^= 1.5jiM, = 30.3pMoI/6 min/36pMol [Expt#2] 

Figure 18 depicts various recombinant plasmids containing partial or full-length mcg7. 

Figure 19 is a representation of the nucleotide sequence [SEQ ID NO:8] and corresponding 
10 amino acid sequence [SEQ ID NO:9] of mcg!8. 

Figure 20 is a representation showing that MCG18 has partial homology to E. coli DnaJ. 

Figure 21 is a representation showing that MCG18 has homology to two Caenorhabitis elegans 
15 proteins. 

Figure 22 is a representation showing that MCG18 has homology to a Saccharomyces pombe 
protein. 

20 Figure 23 is a representation showing homology of MCG18 to a Drosophila virilis protein. 

Figure 24 is a representation showing homology of MCG18 to human DnaJ proteins HDJ- 
2/HSDJ, HDJ-1/HSP40 and HSJ1. 

25 Figure 25 is a representation of the nucleotide and corresponding amino acid sequence of murine 
meg 18. 

Figure 26 is a representation of homology between human and murine MCG18. 

30 Figure 27 depicts nucleotide sequences corresponding to the 5' untranslated region of human 
mcg!8. 
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Figure 28 depicts a Northern blot showing expression of mcglS transcripts in total RNA isolated 
from various human cancer cell lines grown in culture. Lanes 1-5 respectively contain 15/zg 
RNA from H69 lung carcinoma cells, JAM ovary carcinoma cells, BT20 breast carcinoma cells, 
HaCat transformed keratinocytes, T24 bladder carcinoma cells. 

5 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

The present invention provides an isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding an amino acid sequence having 
5 homology to a regulator of gene expression or a derivative of said gene regulator. 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
regulator of gene expression wherein said regulator comprises a zinc finger domain of an (HC 3 ) 2 
10 type. 

Still more particularly, the present invention provides an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

15 (i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
20 to the nucleotide sequence set forth in (i), (ii) or (iii). 

The present invention also provides an isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding an amino acid sequence having 
homology to a guanine nucleotide exchange factor (GEF) or a derivative thereof. 

25 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

30 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ED NO:5 

or 7; 
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(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

Another aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
amino acid sequence having homology to a heat shock protein or a heat shock-binding protein 
or a derivative thereof. 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 



(i) a nucleotide sequence set forth in SEQ ID NO:8; 

1 5 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

20 

Preferably, the percentage similarity is at least about 50%. More preferably, the percentage 
similarity is at least about 60%. 



Reference herein to a low stringency at 42 °C includes and encompasses from at least about 1% 
25 v/v to at least about 15% v/v formamide and from at least about 1M to at least about 2M salt for 
hybridisation, and at least about 1M to at least about 2M salt for washing conditions. Alternative 
stringency conditions may be applied where necessary, such as medium stringency, which 
includes and encompasses from at least about 16% v/v to at least about 30% v/v formamide and 
from at least about 0.5M to at least about 0.9M salt for hybridisation, and at least about 0.5M 
30 to at least about 0.9M salt for washing conditions, or high stringency, which includes and 
encompasses from at least about 31% v/v to at least about 50% v/v formamide and from at least 
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about 0.01M to at least about 0. 15M salt for hybridisation, and at least about 0.01 M to at least 
about 0. 1 5M salt for washing conditions. 

The term "similarity" as used herein includes exact identity between compared sequences at the 
5 nucleotide or amino acid level. Where there is non-identity at the nucleotide level, "similarity" 
includes differences between sequences which result in different amino acids that are nevertheless 
related to each other at the structural, functional, biochemical and/or conformational levels. 
Where there is non-identity at the amino acid level, "similarity" includes amino acids that are 
nevertheless related to each other at the structural, functional, biochemical and/or conformational 
10 levels. 

The present invention extends to nucleic acid molecules with percentage similarities of 
approximately 65%, 70%, 75%, 80%, 85%, 90% or 95% or above or a percentage in between. 

15 The nucleic acid molecule of the present invention defined by SEQ ID NO:2 is hereinafter 
referred to as constituting the "mcg4" gene. The protein encoded by mcg4 is referred to herein 
as "MCG4"and has an amino acid sequence set forth in SEQ ID NO: 3. The mcg4 gene is 
proposed to encode, in accordance with the present invention, a regulator of gene expression and 
comprises a novel zinc finger domain, (HC 3 ) 2 . A regulator of gene expression includes a 

20 transcription factor. Regulation may be at the level of nucleic acid:protein or protein:protein 
interaction. 

The nucleic acid molecule of the present invention defined by SEQ ID NO:4 or 6 is hereinafter 
referred to as constituting the "mcgT gene. The protein encoded by mcg7 is referred to herein 
25 as "MCG7" and has an amino acid sequence set forth in SEQ ID NO:5 or 7 and is involved in 
signal transduction. The difference in the nucleotide and amino acid sequence is due to the 
presence or absence of an exon at nucleotides 183-288. 

The nucleic acid molecule of the present invention defined by SEQ ID NO:8 is hereinafter 
30 referred to as constituting the "mcgl8" gene. The protein encoded by mcgl8 is referred to 
herein as "MCG18" and comprises the amino acid set forth in SEQ ID NO:9. 
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The present invention extends to the naturally occurring genomic mcg4 t mcgl and mcgl8 
nucleotide sequences or corresponding cDNA sequences or to derivatives thereof. Derivatives 
contemplated in the present invention include fragments, parts, portions, mutants, homologues 
and analogues of MCG4, MCG7 or MCG8 or the corresponding genetic sequences. Derivatives 

5 also include single or multiple amino acid substitutions, deletions and/or additions to MCG4, 
MCG7 or MCG18 or single or multiple nucleotide substitutions, deletions and/or additions to 
mcg4 t mcg7 or mcgl8. "Additions" to the amino acid or nucleotide sequences include fusions 
with other peptides, polypeptides or proteins or fusions to nucleotide sequences. Reference 
herein to "MCG4" or "mcg4", "MCG7" or "meg?" or "MCG8" or mcg!8" includes reference to 

10 all derivatives thereof including functional derivatives and immunologically interactive derivatives 
of MCG4, MCG7 or MCG18. 

The mcg4 t mcgl and meg 18 of the present invention are particularly exemplified herein from 
humans and in particular from human chromosome 1 lql3. 

15 

The present invention extends, however, to a range of homologues from, for example, primates, 
livestock animals (eg. sheep, cows, horses, donkeys, pigs), companion animals (eg. dogs, cats) 
laboratory test animals (eg. rabbits, mice, rats, guinea pigs), reptiles, birds (eg. chickens, ducks, 
geese, parrots), insects, nematodes, eukaryotic microorganisms and captive wild animals (eg. 
20 deer, foxes, kangaroos). Reference herein to mcg4 and mcgl8 or their respective proteins 
MCG4, MCG7 and MCG18 includes reference to these molecules of human origin as well as 
novel forms of non-human origin. 

The nucleic acid molecules of the present invention may be DNA or RNA. When the nucleic 
25 acid molecule is in DNA form, it may be genomic DNA or cDN A. RNA forms of the nucleic 
acid molecules of the present invention are generally mRNA. 

Although the nucleic acid molecules of the present invention are generally in isolated form, they 
may be integrated into or ligated to or otherwise fused or associated with other genetic 
30 molecules such as vector molecules and in particular expression vector molecules. Vectors and 
expression vectors are generally capable of replication and, if applicable, expression in one or 
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both of a prokaryotic cell or a eukaryotic cell. Preferably, prokaryotic cells include E. coli, 
Bacillus sp and Pseudomonas sp. Preferred eukaryotic cells include yeast, fungal, mammalian 
and insect cells. 

5 Accordingly, another aspect of the present invention contemplates a genetic construct comprising 
a vector portion and an animal, more particularly a mammalian and even more particularly a 
human mcg4 gene portion, which mcg4 gene portion is capable of encoding an MCG4 
polypeptide or a functional or immunologically interactive derivative thereof. 

10 Preferably, the mcg4 gene portion of the genetic construct is operably linked to a promoter in 
the vector such that said promoter is capable of directing expression of said mcg4 gene portion 
in an appropriate cell. 

In addition, the mcg4 gene portion of the genetic construct may comprise all or part of the gene 
15 fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

20 

It is proposed in accordance with the present invention that MCG4 is a transcription factor 
involved in gene regulation. Mutations in mcg4 may result in aberrations in gene regulation 
leading to the development of or a propensity to develop various types of cancer. In this regard, 
although not wishing to limit the present invention to any one hypothesis or mode of action, it. 
25 is proposed that mcg4 or its expression product may be involved in the tissue-specific or 
temporal regulation of particular genes. 

A deletion or aberration in the mcg4 gene may also be important in the detection of cancer or 
a propensity to develop cancer. An aberration may be a homozygous mutation or a 
30 heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
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be determined by assaying for aberrations in the parents and/or proband of a subject under 
investigation. 

According to this aspect of the present invention, there is contemplated a method of detecting 
5 a condition caused or facilitated by an aberration in mcg4 f said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcg4 wherein the presence of such a nucleotide 
substitution, deletion and/or addition or other aberration may be indicative of said condition or 
a propensity to develop said condition. 

10 

Another aspect of the present invention contemplates a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg7 
gene portion, which mcg7 gene portion is capable of encoding an mcg7 polypeptide or a 
functional or immunologically interactive derivative thereof. 

15 

Preferably, the mcg7 gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said mcg7 gene portion 
in an appropriate cell. 

20 In addition, the mcg7 gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
25 comprising same. 

It is proposed in accordance with the present invention that MCG7 is a GEF involved in signal 
transduction. Mutations in mcg7 or MCG7 may result in defective control of cell proliferation 
leading to the development of or a propensity to develop various types of cancer. 

30 

A deletion or aberration in the mcg7 gene may also be important in the detection of cancer or 
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a propensity to develop cancer. An aberration may be a homozygous mutation or a 
heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
be determined by assaying for aberrations in the parents of a subject under investigation. 

5 

According to this aspect of the present invention, there is contemplated a method of detecting 
a condition caused or facilitated by an aberration in mcg7, said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said meg 7 wherein the presence of such a nucleotide 
10 substitution, deletion and/or addition or other aberration may be indicative of said condition or 
a propensity to develop said condition. 

Yet another aspect of the present invention contemplates a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human 
15 mcgl8 gene portion, which meg 18 gene portion is capable of encoding an MCG18 polypeptide 
or a functional or immunologically interactive derivative thereof. 

Preferably, the mcg!8 gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said mcgl8 gene portion 
20 in an appropriate cell. 

In addition, the mcgl8 gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathiohe-S- 
transferase or part thereof. 

25 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

It is proposed in accordance with the present invention that MCG18 is a transcription factor 
30 involved in protein folding, protein complex assembly and transit through subcellular 
compartments. MCG18 may also have a role in tumour suppression. Thus mutations in mcgl8 



WO 98/53061 



PCT/AU98/00380 



-22- 

may result in the development of or a propensity to develop various types of cancer. 

A deletion or aberration in the meg 18 gene may also be important in the detection of cancer or 
a propensity to develop cancer. An aberration may be a homozygous mutation or a 
5 heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
be determined by assaying for aberrations in the parents and/or proband of the subject under 
investigation. 

10 According to this aspect of the present invention, there is contemplated a method of detecting 
a condition caused or facilitated by an aberration in meg 18, said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcgl8 wherein the presence of such a nucleotide 
substitution, deletion and/or addition or other aberration may be indicative of said condition or 

1 5 a propensity to develop said condition. 

The nucleotide substitutions, additions or deletions may be detected by any convenient means 
including nucleotide sequencing, restriction fragment length polymorphism (RFLP), polymerase 
chain reaction (PCR), oligonucleotide hybridization and single stranded conformation 
20 polymorphism analysis (SSCP) amongst many others. An aberration includes modification to 
existing nucleotides such as to modify glycosylation signal amongst other effects. 

In an alternative method, aberrations in the mcg4, mcg7 and meg 18 genes are detected by 
screening for mutations in MCG4, MCG7 and MCG18, respectively. 

25 

A mutation in MCG4, MCG7 or MCG18 may be a single or multiple amino acid substitution, 
addition and/or deletion. The mutation in mcg4, mcg7 or mcg!8 may also result in either no 
translation product being produced or a product in truncated form. A mutant may also be an 
altered glycosylation pattern or the introduction of side chain modifications to amino acid 
30 residues. 
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According to this aspect of the present invention, there is provided a method of detecting a 
condition caused or facilitated by an aberration in mcg4, mcg7 or mcgl8 said method comprising 
screening for a single or multiple amino acid substitution, deletion and/or addition to MCG4, 
MCG7 or MCG18 wherein the presence of such a mutation is indicative of or a propensity to 
5 develop said condition. 

A particularly convenient means of detecting a mutation in MCG4, MCG7 or MCG18 is by use 
of antibodies. 

10 Accordingly another aspect of the present invention is directed to antibodies to MCG4, MCG7 
or MCG18 and its derivatives. Such antibodies may be monoclonal or polyclonal and may be 
selected from naturally occurring antibodies to MCG4, MCG7 or MCG 18 or may be specifically 
raised to MCG4, MCG7 or MCGI8 or derivatives thereof. In the case of the latter, MCG4, 
MCG7 or MCG18 or their derivatives may first need to be associated with a carrier molecule. 

15 The antibodies to MCG4, MCG7 or MCG18 of the present invention are particularly useful as 
diagnostic agents. 

For example, antibodies to MCG4, MCG7 or MCG18 and their derivatives can be used to screen 
for wild-type MCG4, MCG7 or MCG 1 8 or for mutated MCG4, MCG7 or MCG 1 8 molecules. 

20 The latter may occur, for example, during or prior to certain cancer development. A differential 
binding assay is also particularly useful. Techniques for such assays are well known in the art 
and include, for example, sandwich assays and ELISA. Knowledge of normal MCG4, MCG7 
or MCG 18 levels or the presence of wild-type MCG4, MCG7 or MCG 18 may be important for 
diagnosis of certain cancers or a predisposition for development of cancers or for monitoring 

25 certain therapeutic protocols. 

As stated above antibodies to MCG4, MCG7 or MCG 18 of the present invention may be 
monoclonal or polyclonal or may be fragments of antibodies such as Fab fragments. 
Furthermore, the present invention extends to recombinant and synthetic antibodies and to 
30 antibody hybrids. A "synthetic antibody" is considered herein to include fragments and hybrids 
of antibodies. 
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For example, specific antibodies can be used to screen for wild-type MCG4, MCG7 or MCG18 
molecule or specific mutant molecules such as molecules having a certain deletion. This would 
be important, for example, as a means for screening for levels of MCG4, MCG7 or MCG18 in 
a cell extract or other biological fluid or purifying MCG4, MCG7 or MCG18 made by 
5 recombinant means from culture supernatant fluid or purified from a cell extract. Techniques for 
the assays contemplated herein are known in the art and include, for example, sandwich assays 
and ELISA. 

It is within the scope of this invention to include any second antibodies (monoclonal, polyclonal 
10 or fragments of antibodies or synthetic antibodies) directed to the first mentioned antibodies 
discussed above. Both the first and second antibodies may be used in detection assays or a first 
antibody may be used with a commercially available anti-immunoglobulin antibody. An antibody 
as contemplated herein includes any antibody specific to any region of wild-type MCG4, MCG7 
or MCG18 or to a specific mutant phenotype or to a deleted or otherwise altered region. 

15 

Both polyclonal and monoclonal antibodies are obtainable by immunization of a suitable animal 
or bird with MCG4, MCG7 or MCG18 or its derivatives and either type is utilizable for 
immunoassays. The methods of obtaining both types of sera are well known in the art. 
Polyclonal sera are less preferred but are relatively easily prepared by injection of a suitable 
20 laboratory animal or bird with an effective amount of MCG4, MCG7 or MCG18 or antigenic 
parts thereof or derivatives thereof, collecting serum from the animal or bird, and isolating 
specific sera by any of the known immunoadsorbent techniques. Although antibodies produced 
by this method are utilizable in virtually any type of immunoassay, they are generally less 
favoured because of the potential heterogeneity of the product. 

25 

The use of monoclonal antibodies in an immunoassay is particularly preferred because of the 
ability to produce them in large quantities and the homogeneity of the product. The preparation 
of hybridoma cell lines for monoclonal antibody production derived by fusing an immortal cell 
line and lymphocytes sensitized against the immunogenic preparation can be done by techniques 
30 which are well known to those who are skilled in the art. 
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Another aspect of the present invention contemplates a method for detecting MCG4, MCG7 or 
MCG18 or a derivative thereof in a biological sample said method comprising contacting said 
biological sample with an antibody specific for MCG4, MCG7 or MCG18 or its derivatives or 
homologues for a time and under conditions sufficient for an antibody-MCG4, MCG7 or 
5 MCG18 complex to form, and then detecting said complex. 

Preferably, the biological sample is a cell extract from a human or other animal or a bird. 

The presence of MCG4, MCG7 or MCG18 may be accomplished in a number of ways such as 
10 by Western blotting and ELISA procedures. A wide range of immunoassay techniques are 
available as can be seen by reference to US Patent Nos. 4,016,043, 4, 424,279 and 4,018,653. 
These include both single-site and two-site or "sandwich" assays of the non-competitive types, 
as well as traditional competitive binding assays. These assays also include direct binding of a 
labelled antibody to a target. 

15 

Sandwich assays are among the most useful and commonly used assays and are favoured for use 
in the present invention, A number of variations of the sandwich assay technique exist, and all 
are intended to be encompassed by the present invention. Briefly, in a typical forward assay, an 
unlabelled antibody is immobilized on a solid substrate and the sample to be tested brought into 

20 contact with the bound molecule. After a suitable period of incubation, for a period of time 
sufficient to allow formation of an antibody-antigen complex, a second antibody specific to the 
antigen, labelled with a reporter molecule capable of producing a detectable signal is then added 
and incubated, allowing time sufficient for the formation of another complex of antibody-antigen- 
labelled antibody. Any unreacted material is washed away, and the presence of the antigen is 

25 determined by observation of a signal produced by the reporter molecule. The results may either 
be qualitative, by simple observation of the visible signal, or may be quantitated by comparing 
with a control sample containing known amounts of hapten. Variations on the forward assay 
include a simultaneous assay, in which both sample and labelled antibody are added 
simultaneously to the bound antibody. These techniques are well known to those skilled in the 

30 art, including any minor variations as will be readily apparent. In accordance with the present 
invention the sample is one which might contain MCG4, MCG7 or MCG18 including cell extract 
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or tissue biopsy. The sample is, therefore, generally a biological sample comprising biological 
fluid but also extends to fermentation fluid and supernatant fluid such as from a cell culture. 

In the typical forward sandwich assay, a first antibody having specificity for the MCG4, MCG7 
5 or MCG18 or an antigenic part thereof or a derivative thereof or antigenic parts thereof, is either 
covalently or passively bound to a solid surface. The solid surface is typically glass or a polymer, 
the most commonly used polymers being cellulose, polyacrylamide, nylon, polystyrene, polyvinyl 
chloride or polypropylene. The solid supports may be in the form of tubes, beads, discs of 
microplates, or any other surface suitable for conducting an immunoassay. The binding 

10 processes are well-known in the art and generally consist of cross-linking covalently binding or 
physically adsorbing, the polymer-antibody complex is washed in preparation for the test sample. 
An aliquot of the sample to be tested is then added to the solid phase complex and incubated for 
a period of time sufficient (e.g. 2-40 minutes or overnight if more convenient) and under suitable 
conditions (e.g. from room temperature to 37 °C) to allow binding of any subunit present in the 

15 antibody. Following the incubation period, the antibody subunit solid phase is washed and dried 
and incubated with a second antibody specific for a portion of the hapten. The second antibody 
is linked to a reporter molecule which is used to indicate the binding of the second antibody to 
the hapten. 

20 An alternative method involves immobilizing the target molecules in the biological sample and 
then exposing the immobilized target to specific antibody which may or may not be labelled with 
a reporter molecule. Depending on the amount of target and the strength of the reporter 
molecule signal, a bound target may be detectable by direct labelling with the antibody. 
Alternatively, a second labelled antibody, specific to the first antibody is exposed to the target- 

25 first antibody complex to form a target-first antibody-second antibody tertiary complex. The 
complex is detected by the signal emitted by the reporter molecule. 

By "reporter molecule" as used in the present specification, is meant a molecule which, by its 
chemical nature, provides an analytically identifiable signal which allows the detection of antigen- 
30 bound antibody. Detection may be either qualitative or quantitative. The most commonly used 
reporter molecules in this type of assay are either enzymes, fluorophores or radionuclide 
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containing molecules (i.e. radioisotopes) and chemiiuminescent molecules. 
In the case of an enzyme immunoassay, an enzyme is conjugated to the second antibody, 
generally by means of glutaraldehyde or periodate. As will be readily recognized, however, a 
wide variety of different conjugation techniques exist, which are readily available to the skilled 
5 artisan. Commonly used enzymes include horseradish peroxidase, glucose oxidase, beta- 
galactosidase and alkaline phosphatase, amongst others. The substrates to be used with the 
specific enzymes are generally chosen for the production, upon hydrolysis by the corresponding 
enzyme, of a detectable colour change. Examples of suitable enzymes include alkaline 
phosphatase and peroxidase. It is also possible to employ fluorogenic substrates, which yield a 

10 fluorescent product rather than the chromogenic substrates noted above. In all cases, the 
enzyme-labelled antibody is added to the first antibody hapten complex, allowed to bind, and 
then the excess reagent is washed away. A solution containing the appropriate substrate is then 
added to the complex of antibody-antigen-antibody. The substrate will react with the enzyme 
linked to the second antibody, giving a qualitative visual signal, which may be further quantitated, 

15 usually spectrophotometrically, to give an indication of the amount of hapten which was present 
in the sample. "Reporter molecule" also extends to use of cell agglutination or inhibition of 
agglutination such as red blood cells on latex beads, and the like. 

Alternately, fluorescent compounds, such as fluorescein and rhodamine, may be chemically 
20 coupled to antibodies without altering their binding capacity. When activated by illumination 
with light of a particular wavelength, the fluorochrome-labelled antibody adsorbs the light 
energy, inducing a state to excitability in the molecule, followed by emission of the light at a 
characteristic colour visually detectable with a light microscope. As in the EIA, the fluorescent 
labelled antibody is allowed to bind to the first antibody-hapten complex. After washing off the 
25 unbound reagent, the remaining tertiary complex is then exposed to the light of the appropriate 
wavelength the fluorescence observed indicates the presence of the hapten of interest. 
Immunofluorescence and EIA techniques are both very well established in the art and are 
particularly preferred for the present method. However, other reporter molecules, such as 
radioisotope, chemiiuminescent or bioluminescent molecules, may also be employed. 

30 

As stated above, the present invention extends to genetic constructs capable of encoding MCG4, 
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MCG7 or MCG18 or functional derivatives thereof. Such genetic constructs are also 
contemplated to be useful in modulating expression of specific genes in which mcg4 f mcg7 or 
mcgl8 is involved in tissue-specific or temporal regulation. 

5 Accordingly, another aspect of the present invention is directed to a genetic construct comprising 
a nucleotide sequence encoding a peptide, polypeptide or protein and mcg4, mcg7 or mcgl8 or 
a functional derivative or homologue thereof capable of modulating the expression of said 
nucleotide sequence. 

10 As stated above, MCG18 is proposed to have a role in tumour suppression. Accordingly, it is 
further proposed in accordance with the present invention to use recombinant MCG18 in 
pharmaceutical preparations for treating arresting or otherwise ameliorating the effects of certain 
cancers. 

15 Accordingly, another aspect of the present invention contemplates a method for treating, 
arresting or otherwise ameliorating the effects of a cancer in an animal or bird, said method 
comprising administering to said animal or bird an effective amount of MCG18 or a functional 
derivative thereof for a time and under conditions sufficient to treat, arrest or otherwise 
ameliorate the effects of said cancer. 

20 

The present invention, therefore, contemplates a pharmaceutical composition comprising 
MCG18 or a derivative thereof or a modulator of mcgl8 expression or MCG18 activity and one 
or more pharmaceutically acceptable carriers and/or diluents. These components are referred 
to hereinafter as the "active ingredients'*. The active ingredients may also include anti-cancer 
25 agents or agents which facilitate actions of MCG 1 8. 

The pharmaceutical forms suitable for injectable use include sterile aqueous solutions (where 
water soluble) and sterile powders for the extemporaneous preparation of sterile injectable 
solutions. It must be stable under the conditions of manufacture and storage and must be 
30 preserved against the contaminating action of microorgariisms such as bacteria and fungi. The 
carrier may be a solvent medium containing, for example, water, ethanol, polyol (for example, 
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glycerol, propylene glycol and liquid polyethylene glycol and the like), suitable mixtures thereof, 
and vegetable oils. The proper fluidity can be maintained, for example, by the use of a coating 
such as licithin and by the use of superfactants. The preventions of the action of microorganisms 
can be brought about by various antibacterial and antifungal agents, for example, parabens, 
5 chlorobutanol, phenol, sorbic acid, thimersal and the like. In many cases, it will be preferable to 
include isotonic agents, for example, sugars or sodium chloride. Prolonged absorption of the 
injectable compositions can be brought about by the use in the compositions of agents delaying 
absorption, for example, aluminum monostearate and gelatin. 

10 Sterile injectable solutions are prepared by incorporating the active compounds in the required 
amount in the appropriate solvent with various of the other ingredients enumerated above, as 
required, followed by filtered sterilization. In the case of sterile powders for the preparation of 
sterile injectable solutions, the preferred methods of preparation are vacuum drying and the 
freeze-drying technique which yield a powder of the active ingredient plus any additional desired 

15 ingredient from previously sterile-filtered solution thereof. 

When the active ingredients are suitably protected they may be orally administered, for example, 
with an inert diluent or with an assimilable edible carrier, or it may be enclosed in hard or soft 
shell gelatin capsule, or it may be compressed into tablets, or it may be incorporated directly with 

20 the food of the diet. For oral therapeutic administration, the active compound may be 
incorporated with excipients and used in the form of ingestible tablets, buccal tablets, troches, 
capsules, elixirs, suspensions, syrups, wafers, and the like. Such compositions and preparations 
should contain at least 1% by weight of active compound. The percentage of the compositions 
and preparations may, of course, be varied and may conveniently be between about 5 to about 

25 80% of the weight of the unit. The amount of active compound in such therapeutically useful 
compositions in such that a suitable dosage will be obtained. Preferred compositions or 
preparations according to the present invention are prepared so that an oral dosage unit form 
contains between about 0. 1 /ig and 2000 mg of active compound. 

30 The tablets, troches, pills, capsules and the like may also contain the components as listed 
hereafter. A binder such as gum, acacia, corn starch or gelatin; excipients such as dicalcium 
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phosphate; a disintegrating agent such as corn starch, potato starch, alginic acid and the like; 
a lubricant such as magnesium s tear ate; and a sweetening agent such a sucrose, lactose or 
saccharin may be added or a flavouring agent such as peppermint, oil of wintergreen, or cheny 
flavouring. When the dosage unit form is a capsule, it may contain, in addition to materials of 

5 the above type, a liquid carrier. Various other materials may be present as coatings or to 
otherwise modify the physical form of the dosage unit. For instance, tablets, pills, or capsules 
may be coated with shellac, sugar or both. A syrup or elixir may contain the active compound, 
sucrose as a sweetening agent, methyl and propylparabens as preservatives, a dye and flavouring 
such as cherry or orange flavour. Of course, any material used in preparing any dosage unit form 

10 should be pharmaceutically pure and substantially non-toxic in the amounts employed. In 
addition, the active compound(s) may be incorporated into sustained-release preparations and 
formulations. 

The present invention also extends to forms suitable for topical application such as creams, 
15 lotions and gels. 

Pharmaceutically acceptable carriers and/or diluents include any and all solvents, dispersion 
media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents and 
the like. The use of such media and agents for pharmaceutical active substances is well known 
20 in the art. Except insofar as any conventional media or agent is incompatible with the active 
ingredient, use thereof in the therapeutic compositions is contemplated. Supplementary active 
ingredients can also be incorporated into the compositions. 

It is especially advantageous to formulate parenteral compositions in dosage unit form for ease 
25 of administration and uniformity of dosage. Dosage unit form as used herein refers to physically 
discrete units suited as unitary dosages for the mammalian subjects to be treated; each unit 
containing a predetermined quantity of active material calculated to produce the desired 
therapeutic effect in association with the required pharmaceutical carrier. The specification for 
the novel dosage unit forms of the invention are dictated by and directly dependent on (a) the 
30 unique characteristics of the active material and the particular therapeutic effect to be achieved, 
and (b) the limitations inherent in the art of compounding such an active material for the 
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treatment of disease in living subjects having a diseased condition in which bodily health is 
impaired as herein disclosed in detail. 

The principal active ingredient is compounded for convenient and effective administration in 
5 effective amounts with a suitable pharmaceutically acceptable carrier in dosage unit form as 
hereinbefore disclosed. A unit dosage form can, for example, contain the principal active 
compound in amounts ranging from 0.5 \xg to about 2000 mg. Expressed in proportions, the 
active compound is generally present in from about 0.5 jig to about 2000 mg/ml of carrier. In 
the case of compositions containing supplementary active ingredients, the dosages are 
10 determined by reference to the usual dose and manner of administration of the said ingredients. 

Effective amounts contemplated by the present invention include those amounts effective to 
ameliorate a condition. For example, it is envisaged that effective amounts would range from 
about 0.001 Mg/kg body weight to about 100 mg/kg body weight. Alternatively, effective 
15 amounts of about 0.01 //g/kg body weight to about 10 mg/kg body weight or even 0. 1 Mg/kg 
body weight to about 1 mg/kg body weight. Administration may be per minute, hour, day, week, 
month or year or may only be a once off administration. 

The pharmaceutical composition may also comprise genetic molecules such as a vector capable 
20 of transfecting target cells where the vector carries a nucleic acid molecule capable of modulating 
meg 18 expression or MCG18 activity. The vector may, for example, be a viral vector. 

As stated above, the present invention further contemplates a range of derivatives of MCG 1 8. 

Derivatives include fragments, parts, portions, mutants, homologues and analogues of the 
25 MCG 18 polypeptide and corresponding genetic sequence. Derivatives also include single or 

multiple amino acid substitutions, deletions and/or additions to MCG 18 or single or multiple 

nucleotide substitutions, deletions and/or additions to the genetic sequence encoding MCG 18. 

"Additions" to amino acid sequences or nucleotide sequences include fusions with other 

peptides, polypeptides or proteins or fusions to nucleotide sequences. Reference herein to 
30 44 MCG18" includes reference to all derivatives thereof including functional derivatives or MCG 18 

immunologically interactive derivatives. 
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Analogues of MCG18 contemplated herein include, but are not limited to, modification to side 
chains, incorporating of unnatural amino acids and/or their derivatives during peptide, 
polypeptide or protein synthesis and the use of crosslinkers and other methods which impose 
conformational constraints on the proteinaceous molecule or their analogues. 

5 

Examples of side chain modifications contemplated by the present invention include 
modifications of amino groups such as by reductive alkylation by reaction with an aldehyde 
followed by reduction with NaBHLt; amidination with methylacetimidate; acylation with acetic 
anhydride; carbamoylation of amino groups with cyanate; trinitrobenzylation of amino groups 
10 with 2, 4, 6-trinitrobenzene sulphonic acid (TNBS); acylation of amino groups with succinic 
anhydride and tetrahydrophthalic anhydride; and pyridoxylation of lysine with pyridoxal-5- 
phosphate followed by reduction with NaBIfy. 

The guanidine group of arginine residues may be modified by the formation of heterocyclic 
15 condensation products with reagents such as 2,3-butanedione, phenylglyoxal and glyoxal. 

The carboxyl group may be modified by carbodiimide activation via Oacylisourea formation 
followed by subsequent derivitisation, for example, to a corresponding amide. 

20 Sulphydryl groups may be modified by methods such as carboxymethylation with iodoacetic acid 
or iodoacetamide; performic acid oxidation to cysteic acid; formation of a mixed disulphides 
with other thiol compounds; reaction with maleimide, maleic anhydride or other substituted 
maleimide; formation of mercurial derivatives using 4-chloromercuribenzoate, 4- 
chloromercuriphenylsulphonic acid, phenylmercury chloride, 2-chloromercuri-4-nitrophenol and 

25 other mercurials; carbamoylation with cyanate at alkaline pH. 

Tryptophan residues may be modified by, for example, oxidation with N-bromosuccinimide or 
alkylation of the indole ring with 2-hydroxy-5-nitrobenzyl bromide or sulphenyl halides. 
Tyrosine residues on the other hand, may be altered by nitration with tetranitromethane to form 
30 a 3-nitrotyrosine derivative. 
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Modification of the imidazole ring of a histidine residue may be accomplished by alkylation with 
iodoacetic acid derivatives or N-carbethoxylation with diethylpyrocarbonate. 

Examples of incorporating unnatural amino acids and derivatives during peptide synthesis 
5 include, but are not limited to, use of norleucine, 4-amino butyric acid, 4-amino-3-hydroxy-5- 
phenylpentanoic acid, 6-aminohexanoic acid, t-butylglycine, norvaline, phenylglycine, ornithine, 
sarcosine, 4-amino-3-hydroxy-6-methylheptanoic acid, 2-thienyl alanine and/or D-isomers of 
amino acids. A list of unnatural amino acids, contemplated herein is shown in Table 3. 
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TABLE 3 



Mnn-rrinvpntiofiJil 

v/UIl YGllllvfllOl 


Code 


Non-conventional 


Code 


alllillU aVIU 




amino acid 




U-allUIlUUULy ill* awiu 


Abu 


L-N-methylalanine 


Nmala 


iY-nmi nrviY-methvlbiitvrate 


Mgabu 


L-N-methylarginine 


Nmarg 


aim nocy uprupaiic- 


Cnro 


L-N-methylasparagine 


Nmasn 


COI LHJAjr lalv 




L-N -methyl aspartic acid 


Nmasp 


iu aininuihuuuiyiiLr aciu 


Aib 


L-N-methvlcvsteine 


Nmcys 


allUlIUUUl UU1 11 j 1 


Norb 


L-N-methylglutamine 


Nmgln 


r»o rV*i^ v\/l atf* 

Cdrooxyidic 




L-N-methylglutamic acid 


Nmglu 


cycioncxyididniiic 


Chexa 

VllvAO 


L-N-methylhistidine 


Nmhis 


cy ciopeniy 1 alanine 


v_/pcii 


L-N-methvlisolleucine 


Nmile 


i j alanine 


Dal 


L-N-methvlleucine 


Nmleu 


u-argimne 




T v-N-methvllvsine 


Nmlys 


JLx aopoi iiw ciciu 


Dasp 


L-N-methylmethionine 


Nmmet 


;cf f»i np 
L/ - Cy alClllC 


Dcys 


L-N-methylnorleucine 


Nmnle 


IJ-glUlallllllC 


Deln 


L-N-methylnorvaline 


Nmnva 


9fl D-tjlntamic acid 


Delu 


L-N-methylornithine 


Nmorn 


1/1 JUo L1U111& 


Dhis 


L-N-methylphenylalanine 


Nmphe 


L-/~loUlCUWlilC 


Dile 


L-N-methylproline 


Nmpro 


iv-ieucinc 


Dleu 


L-N-methylserine 


Nmser 


LMysine 


r>iv<i 

■L/ljo 


T -N-methvl threonine 


Nmthr 


25 D-methionine 


Dmet 


L-N-methyltryptophan 


Nmtrp 


D-ornithine 


Dorn 


L-N-methyltyrosine 


Nmtyr 


D-phenylalanine 


Dphe 


L-N-methylvaline 


Nmval 


D-proline 


Dpro 


L-N-methylethylglycine 


Nmetg 


D-serine 


Dser 


L-N-methyl-t-butylglycine 


Nntbug 


30 D-threonine 


Dthr 


L-norleucine 


Nie 


D-tryptophan 


Dtrp 


L-norvaline 


Nva 
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D-tyrosine 


Dtyr 


D-valine 


Dval 


D-oc-methylalanine 


Dmala 


D-a-methylarginine 


Dmarg 


5 D-a-methylasparagine 


Dmasn 


D-a-methylaspartate 


Dmasp 


D-a-methylcysteine 


Dmcys 


D-a-methy)glutamine 


Dmgln 


D-a-methylhistidine 


Dmhis 


10 D-a-methylisoleucine 


Dmile 


D-a-methylleucine 


Dmleu 


D-a-methyllysine 


Dmlys 


D-a-methylmethionine 


Dmmet 


D-a-methylornithine 


Dmorn 


15 D-a-methylphenylalanine 


Dmphe 


D-a-methylproline 


Dmpro 


D-a-methylserine 


Dmser 


D-a-methylthreonine 


Dmthr 


D-a-methyltryptophan 


Dmtrp 


20 D-a-methyltyrosine 


Dmty 


D-a-methyl valine 


Dmval 


D-N-methylalanine 


Dnmala 


D-N-methylarginine 


Dnmarg 


D-N-methylasparagine 


Dnmasn 


25 D-N-methylaspartate 


Dnmasp 


D-N-methylcysteine 


Dnmcys 


D-N-methylglutamine 


Dnmgln 


D-N-methylglutamate 


Dnmglu 


D-N-methylhistidine 


Dnmhis 


30 D-N-methylisoleucine 


Dnmile 


D-N-methylleucine 


Dnmleu 



a-methyl-aminoisobutyrate 


Maib 


a-methyl-Y-aminobutyrate 


Mgabu 


a-methylcyclohexylalanine 


Mchexa 


a-methylcylcopentylalanine 


Mcpen 


a-methyl-a-napthylalanine 


Manap 


a-methylpenicillamine 


Mpen 


N-(4-aminobutyl)glycine 


Nglu 


N-(2-aminoethyl)glycine 


Naeg 


N-(3-aminopropyl)glycine 


Norn 


N-amino-cc-methylbutyrate 


Nmaabu 


a-napthylalanine 


Anap 


N-benzylglycine 


Nphe 


N-(2-carbamylethyl)glycine 


Ngln 


N-(carbamylmethyl)glycine 


Nasn 


N-(2-carboxyethyl)glycine 


Nglu 


N-(carboxymethyl)glycine 


Nasp 


N-cyclobutylglycine 


Ncbut 


N-cycloheptylglycine 


Nchep 


N-cyclohexylglycine 


Nchex 


N-cyclodecylglycine 


Ncdec 


N-cylcododecylglycine 


Ncdod 


N-cyclooctylglycine 


Ncoct 


N-cyclopropylglycine 


Ncpro 


N-cycloundecylglycine 


Ncund 


N-(2,2-diphenylethyl)glycine 


Nbhm 


N-(3,3-diphenylpropyl)gIycine 


Nbhe 


N-(3-guanidinopropyl)glycine 


Narg 


N-( 1 -hydroxyethy l)glycine 


Nthr 


N-(hydroxyethyl))glycine 


Nser 


N-(imidazolylethyl))glycine 


Nhis 


N-(3-indolylyethyl)glycine 


Nhtrp 
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D-N-methyllysine 


Dnmlys 


N-methylcyclohexylalanine 


Nmchexa 


D-N-methylornithine 


Dnmom 


N-methylglycine 


Nala 


5 N-methylaminoisobutyrate 


Nmaib 


N-( l-methylpropyl)glycine 


Nile 


N-(2-methylpropyl)glycine 


Nleu 


D-N-methyltryptophan 


Dnmtrp 


D-N-methyltyrosine 


Dnmtyr 


10 D-N-methylvaline 


Dnmval 


Y-aminobutyric acid 


Gabu 


L-r-butylglycine 


Tbug 


L-ethylglycine 


Etg 


L-homophenylalanine 


Hphe 


15 L-a-methylarginine 


Marg 


L-a-methylaspartate 


Masp 


L-oc-methylcysteine 


Mcys 


L-a-methylglutamine 


Mgln 


L-a-methylhistidine . 


Mhis 


20 L-a-methylisoleucine 


Mile 


L-a-methylleucine 


Mleu 


L-a-methylmethionine 


Mmet 


L-a-methylnorvaline 


Mnva 


L-a-methylphenylalanine 


Mphe 


25 L-a-methylserine 


Mser 


L-a-methyltryptophan 


Mtrp 



N-methy 1- y -aminobutyrate 


Nngabu 


D-N-methylmethionine 


Dnmmet 


N-methylcyclopentylalanine 


Nmcpen 


D-N-methylphenylalanine 


Dnmphe 


D-N-methylproline 


Dnmpn) 


D-N-methylserine 


Dnmser 


D-N-methylthreonine 


Dnmthr 


N-( 1 -methylethyl)glycine 


Nval 


N-methyla-napthylalanine 


Nmanap 


N-methylpenicillamine 


Nmpen 


N-(p-hydroxyphenyl)glycine 


Nhtyr 


N-(thiomethyl)glycine 


Ncys 


penicillamine 


Pen 


L-a-methylalanine 


Mala 


L-a-methylasparagine 


Masn 


L-a-methyl-f-butylglycine 


Mtbug 


L-methylethylglycine 


Metg 


L-a-methylglutamate 


Mglu 


L-a-methylhomophenylalanine 


Mhphe 


N-(2-methylthioethyl)glycine 


Nmet 


L- a-methy lly sine 


Mlys 


L- a-methy lnorleucine 


Mnle 


L-a-methylornithine 


Mom 


L-a-methylproline 


Mpro 


L- a-methy lthreonine 


Mthr 


L-a-methyltyrosine 


Mtyr 



WO 98/53061 



PCT/AU98/00380 



-37 

L-a-methylvaline Mval 
N-(N-(2,2-diphenylethyl) Nnbhm 
carbamylmethyl)glycine 
1 -carboxy- 1 -(2,2-dipheny 1- Nmbc 
5 ethylamino)cyclopropane 



L-N-methylhomophenylalanine Nrahphe 
N-(N-(3,3-diphenylpropyl) Nnbhe 
carbamylmethyOglycine 



Crosslinkers can be used, for example, to stabilise 3D conformations, using homo-bifunctional 
crosslinkers such as the Afunctional imido esters having (CH2) n spacer groups with n=l to n=6, 

10 glutaraldehyde, N-hydroxysuccinimide esters and hetero-bifunctional reagents which usually 
contain an amino-reactive moiety such as N-hydroxysuccinimide and another group specific- 
reactive moiety such as maleimido or dithio moiety (SH) or carbodiimide (COOH). In addition, 
peptides can be conformationally constrained by, for example, incorporation of C a and - 
methylamino acids, introduction of double bonds between C a and C p atoms of amino acids and 

15 the formation of cyclic peptides or analogues by introducing covalent bonds such as forming an 
amide bond between the N and C termini, between two side chains or between a side chain and 
the N or C terminus. 

Such analogues also apply in respect of MCG4 and MCG7. 

20 

The present invention further contemplates chemical analogues of MCG18 capable of acting as 
antagonists or agonists of MCG18 or which can act as functional analogues of MCG18. 
Chemical analogues may not necessarily be derived from MCG18 but may share certain 
conformational similarities. Alternatively, chemical analogues may be specifically designed to 
25 mimic certain physiochemical properties of MCG18. Chemical analogues may be chemically 
synthesised or may be detected following, for example, natural product screening. 

The identification of MCG -.8 permits the generation of a range of therapeutic molecules capable 
of modulating expression of MCG18 or modulating the activity of MCG18. Modulators 
30 contemplated by the present invention includes agonists and antagonists of MCG18 expression. 
Antagonists of MCG18 expression include antisense molecules, ribozymes and co-suppression 
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molecules. Agonists include molecules which increase promoter ability or interfere with negative 
regulatory mechanisms. Agonists of MCG18 include molecules which overcome any negative 
regulatory mechanism. Antagonists of MCG18 include antibodies and inhibitor peptide 
fragments. 

5 

These types of modifications may be important to stabilise MCG18 if administered to an 
individual or for use as a diagnostic reagent. 

Other derivatives contemplated by the present invention include a range of glycosylation variants 
10 from a completely unglycosylated molecule to a modified glycosylated molecule. Altered 
glycosylation patterns may result from expression of recombinant molecules in different host 
cells. 

Another embodiment of the present invention contemplates a method for modulating expression 
15 of MCG18 in a human, said method comprising contacting the mcg!8 gene encoding MCG18 
with an effective amount of a modulator of meg 18 expression for a time and under conditions 
sufficient to up-regulate or down-regulate or otherwise modulate expression of mcgl8. For 
example, a nucleic acid molecule encoding MCG18 or a derivative thereof may be introduced 
into a cell to facilitate protection of that cell from becoming cancerous. 

20 

Another aspect of the present invention contemplates a method of modulating activity of MCG1 8 
in a human, said method comprising administering to said mammal a modulating effective amount 
of a molecule for a time and under conditions sufficient to increase or decrease MCG18 activity. 
The molecule may be a proteinaceous molecule or a chemical entity and may also be a derivative 
25 of MCG18 or a chemical analogue or truncation mutant of MCG18. 

The present invention is further described with reference to the following non-limiting Examples. 
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EXAMPLE 1 



A human gene (designated mcg4) was identified on chromosome 1 lq 1 3 that on the basis of 
sequence homology is predicted to encode a putative transcription factor of 310 amino acids 
5 (Fig. 1). mcg4 is transcribed in several different cell lines (Fig. 7). 

EXAMPLE 2 

The expressed sequence tag (EST) database contains partial sequence data for the murine (Fig. 
10 2) and nematode (Fig. 3) homologues of mcg4. 

EXAMPLE 3 

MCG4 contains a sequence of cysteine residues within the N-terminal region of the protein that 
15 resembles zinc-finger binding domains of a novel type, ie. (HC 3 ) 2 [Fig. 4]. 

EXAMPLE 4 

Sensitive sequence homology searches reveal that related cysteine-containing motifs are present 
20 in another C. elegans protein (Fig. 5) as well as the GATA-binding transcription factor from 5. 
pombe (Fig. 6). 

EXAMPLE 5 

25 mcg4 will have commercial value due to its likelihood of encoding a novel transcription factor 
that is highly conserved amongst organisms, thus suggesting an integral role in gene regulation. 
mcg4 may also be involved in some way in tissue-specific or temporal regulation of certain genes, 
thus making it a potential target for modulating expression of those downstream effectors. 



30 
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EXAMPLE 6 

Nucleotide sequence data generated from cosmid clone cSRL-72c4 with the T7 primer 
(Pronrega, and Applied Biosystems Incorporated dye terminator sequencing kit) was aligned to 

5 the GenBank Expressed Sequence Tag (EST) database using the program BLASTN (Altschul 
et al 1990) and was found to match numerous human and mouse entries (Table 4 and Figure 2). 
These matching ESTs were further used to identify overlapping entries in the EST database 
(Table 5). The nucleotide sequences of these human ESTs were complied using MacVector 
4.2.1 software (IBI-Kodak) to produce the cDNA sequence shown in Figure 1. EST entries 

10 AA074703 and AA1 34788 are closely related at the nucleotide level to mcg4 and it is, therefore, 
likely that mcg4 is a member of a newly discovered gene family (Figure 8). 

The cDNA sequence of mcg4 was translated in all possible reading frames and compared to the 
GenBank non-redundant protein database using the program BLASTX (Altschul et al 1990) at 

15 the National Center for Biotechnology Information (http//www.ncbi.nih.gov.nlm). As the 
protein appeared to be novel, a translation of the longest reading frame for the mcg4 cDN A was 
aligned to the EST database using the program TBLASTN, which performed a dynamic 
translation of the EST database in all 6 frames. The search results indicated that the nematode 
C. elegans had an MCG4-like protein (Figure 3), with the matching domains containing a spatial 

20 sequence of Cysteine and Histidine residues which resembled a zinc-finger structure (Figure 4). 
The program BLASTP was used, therefore, to conduct sensitive searches of the protein 
databases for similar zinc-finger motifs. A weak match to the putative zinc-finger domain was 
observed for another protein from C. elegans (Figure 5) and a poorer match for the GATA- 
binding transcription factor from S. pombe (Figure 6). The putative initiation codon of human 

25 mcg4 is not preceded by an in-frame stop codon and it is therefore possible that the cDNA 
described in Figure 1 is a truncated form. However, sequence alignment of human and mouse 
mcg4 ESTs showed a lower degree of nucleotide conservation prior to the assigned initiation 
codon, thus supporting the notion that the region represents the 5' UTR (Figure 9). To 
determine the expression pattern of mcg4, 15/ig of the total cellular RNA (RNeasy Mini Kit, 

30 Qiagen) from various human cell lines grown in culture were electrophoresed through 1.2% w/v 
MOPS/formaldehyde gels and blotted onto nylon membranes (Amersham) by capillary transfer 
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using 20 x SSC (Sambrook et al, 1989). Filters were subsequently UV-fixed and hybridised 
overnight at 65°C to a radiolabeled ( 32 P-dCTP) cDNA probe (Church and Gilbert, 1984) for 
mcg4. After washes in 0.1 x SSC/0. 1% w/v SDS at 65°C for 1 hour, the filters were air-dried 
and exposed to X-ray film. This Northern analysis showed that mcg4 is expressed as a 1.6kb 
5 message in numerous tissues including breast, ovary, bladder, lung and keratinocytes (Figure 7). 

EXAMPLE 7 

A human gene (designated mcgT) was identified and isolated from chromosome 1 lql3 which 
10 encodes a protein that bears striking homology with guanine nucleotide exchange factors (GEFs) 
from a wide variety of organisms (Fig. 12). 

EXAMPLE 8 

15 The composite mcg7 cDNA sequence is at least 2.4kb in length and Figure 13(a) shows a 
predicted translation product of at least 609 amino acids beginning at methionine 120. An 
alternative start site due to alternate exon splicing (indicated in lower case) may yield a protein 
of 671 amino acids starting at methionine 58 (Fig. 13a). 

20 EXAMPLE 9 

An mcg7 homologue from C elegans has been identified, the product of which is highly 
conserved with that of MCG7 (Fig. 14). There are several salient features of the protein which 
have been underlined in Fig. 14 - namely: a guanine nucleotide binding region, a diacylglycerol 
25 binding region, and "EF-hand"-calcium binding regions. In addition, there are several potential 
cAMP, protein kinase C, and casein kinase II phosphorylation sites, as well as a number of 
potential sites for glycosylation (not indicated). 

EXAMPLE 10 

30 



A number of partial human and murine EST clones exist for mcgl. The GenBank database 
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contains a cDNA (Acc. no. Y 12336) encoding a full-length open reading frame (ORF) for human 
mcg7 as well as a partial murine mcg7 ORF (Y 12339). In addition, the complete genomic 
sequence of the human meg 7 gene is contained within GenBank entry AC000134. 

5 EXAMPLE 11 

The best characterised GEFs are members of the family of ras oncoproteins, which play a pivotal 
role in signal transduction and when mutated are responsible for tumour development. A variety 
of therapeutic regiires for cancer treatment have been designed to specifically interfere with the 
10 ras signalling pathways. There is potential, therefore that the product of mcg7 could also be a 
target for such clinical strategies. 

EXAMPLE 12 

15 The nucleotide sequence for mcg7 cDNA was extended 5' with genomic DNA sequence from 
Genbank accession number AC000134 (positions 1-321) and analysed for additional coding 
sequence 5' to the putative initiation codon (nt 681-683) (Fig. 16). An additional in-frame ATG 
occurs at position nt 495-497 when the alternatively splice exon (position nt 504-609) is present 
(also shown in Fig. 13(a)). This closely matches the Kozak consensus. When this exon is 

20 absent, then the ATG is not in-frame and other possible initiation codons are absent (resulting 
translation shown in lower case lettering) (also shown in Fig. 13(b)). Further evidence that the 
initiation codon at position nt 681-683 is the true initiation site is given in Figure 15. 

Alignment of human and a partial murine mcg7 cDNA sequences is shown in Figure 15. The * 
25 putative initiation codon is at position nt 360-362. Both murine ESTs appear to have an 
upstream in-frame stop codon at position nt 326-328, downstream of the differentially spliced 
exon and the sequence alignment thus suggests that this region represents the 5' UTR of meg 7. 

Furthermore, similarity with the C. elegans homologue strongly suggest that the ATG codon at 
30 position nt 360-362 encodes the N-terminus of MCG7. 
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EXAMPLE 13 

Figure 17 shows data from experiments indicating that a truncated version of MCG7 when 
expressed as a GST fusion protein (construct B in Fig. 18) can function as a Ras-guanine 
5 nucleotide exchange factor. In brief, Ras (unprocessed and as a GST fusion protein) is loaded 
with 3 H-GDP then incubated in the presence of excess cold GTP ± GST-MCG7. Full details of 
this assay can be found in Porfiri et al 

EXAMPLE 14 

10 

Nucleotide sequence data generated from cosmid clone cSRL-20hl2 with the T7 primer 
(Promega, and Applied Biosystems Incorporated dye terminator sequencing kit) were aligned 
to the GenBank Expressed Sequence Tag (EST) database using the program BLASTN (Altschul 
et at, 1990) and was found to match GenBank entries T78563 (clone 1 13434) TO9103 (clone 
15 HIBBP12) and AA035643 (clone 471819). EST clones 1 13434 and 471819 were obtained from 
Genome Systems Inc. and these DNAs were sequenced on both strands with gene-specific 
primers (Table 5) to generate the cDNA sequence of mcgl shown in Figures 13(a) and (b). 

The cDNA sequence of mcgl was translated in all possible reading frames and compared to the 
20 GenBank non-redundant protein database using the program BLASTX (Altschul et al, 1 990) and 
the coding region was assigned on the basis of showing homology to the C. elegans protein 
F25B3.3 (Figure 14). The mcgl cDNA composite was suspected to contain a single nucleotide 
error that originated from clone 471819 and the correct nucleotide sequence was, therefore, 
sought by reverse transcription-polymerase chain reaction (RT-PCR) of the cDNA fragment 
25 from a human cDNA pool. Total RNA was extracted from a human lymphoblastoid cell line 
using an RNeasy Mini Kit (Qiagen). cDNA synthesis was conducted with the reverse 
transcriptase Superscript II RNaseH- (GIBCO, BRL) and random hexamers using the procedure 
recommended by the manufacturer (GIBCO, BRL). One fortieth of the cDNA mix was 
subjected to 35 cycles of PCR using the following cycling conditions: 94°C for 30 seconds, 58°C 
30 for 30 seconds and 72°C for 90 seconds. The 50/zl reaction mix consisted of lx reaction buffer 
(Dade Scientific), 2mM dNTP mix, 20pmol of primers (see Table 6) MCG7UF (within the 
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variably spliced exon of Figure 13(b), between nucleotide positions 184-201) and SGCADRV2 
(between nucleotide positions 866-846 of Figure 13(a)) and 10 units of Dynazyme (Dade 
Scientific). The resulting PCR product was cloned into the pGEM-T vector (Promega) using 
standard methodology and sequenced using gene-specific primers. The correct nucleotide 
5 sequence of mcgl (as shown in Figure 13(a)) matches that of the recently release GenBank entry 
Y12336. A partial mouse mcgl cDNA sequence can also be found in GenBank entry Y 12339. 

EXAMPLE 15 

10 The coding sequence of mcgl was cloned into vectors for expression in both bacterial and 
mammalian cells. In addition to the full-length constructs, the deletion constructs shown in 
Figure 18 were designed to retain the guanine nucleotide exchange (GEF) domain. For 
prokaryotic expression, the mcgl coding region was inserted downstream of and in-frame with 
the Sj26 cassette of the pGEX (Pharmacia) series of vectors (Smith and Johnson, 1988) using 

15 standard cloning techniques (Sambrook et al, 1989). For mammalian expression, the mcgl 
coding sequence was first myc-tagged at the N-terminus and then ligated into the expression 
vector pc Exv-n using standard cloning techniques. Ligation junctions of the constructs were 
sequences as the cloning strategies inadvertently changed or introduced additional amino acids 
as shown below. 

20 

Construct (A): EST clone 1 13434 was digested with Apal (Figure 13(a), nucleotide positions 
1022 to >2416 (within the vector)), blunt-ended with T4 DNA polymerase according to the 
specifications of the manufacturer (New England Biolab) and ligated into the Smal site of pGEX- 
3X. 

25 

Sequence of the pGEX and mcgl (underlined) junction: 
pGEX-3X mcgl (1022) 

Sj26 ,.. GGG ATC CCC CTG GTC [SEQ ID NO: 19] 

additional amino acids Gly Be Pro 

30 

Construct (B): EST clone 113434 was digested with EcoRL (Figure 13(a), nucleotide 
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positions <695 (within the vector) to 171 1) and ligated into the EcoRl site of pGEX- 1 . 

Sequence of the pGEX and mcgl (underlined) junction: 
pGEX-1 mcgl (695) 

5 Sj26 ... GAA TTC GGC ACG AG C CGA CGG [SEQ ID NO:20] 

additional amino acids Glu Phe Gly Thr Ser 

Construct (C): full-length mcgl: The pGEM-T clone containing the 5' end of the mcgl coding 
region was digested with Apal (subsequently blunt-ended with T4 DNA polymerase) and BstXl 
10 to liberate the fragment between nucleotide positions 336 and 830 of Figure 13(a). Clone 
1 13434 was digested with BstXl and Hindlll (vector derived) to liberate a fragment between 
nucleotide positions 830 > and 2416 (vector derived) of Figure 13(a). A pGEM-1 lzf vector 
(Promega) containing the myc-tag was digested with Apal (subsequently blunt-ended with T4 
DNA polymerase) and HindUl, and ligated with the 2 inserts described above. 

15 

Sequence of the mycAz%lmcg7 junction [SEQ ID NOs:21/22]: 

myc-tag — vector BamHI mcgl 5' UTR (337) start 

ATGG AGC AGAAGC TG ATCTCCGAGGAGGACCTG CCCGGGGCAGCTggatCCG CAGCCCACCCCGCGCCGGCGGCCATG 
20 M EQKLISEEDL PGAAGS AAHPAPAAM 

additional amino acids 

The myc-tagged full-length mcgl insert in pGEM-1 lzf was then excised with Sacl and Hindlll 
(both vector derived) and directionally cloned into the mammalian expression vector pEXV 
25 (Beranger et al t 1994). 

Construct (D): Construct (C) in pGEM-1 lzf was sequentially digested with HindlH (this site 
was subsequently blunt-ended with T4 DNA polymerase) then BamHI, and ligated into pGEX- 
2T digested with BamHI and Smal. Digestion with BamHI, and ligated into pGEX-2T digested 
30 with BamHI and Smal. Digestion with BamHI removed the myc-tag of Construct (C). 

Sequence of the pGEX and mcgl [SEQ ID NO:23/24] (underlined) junction: 
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pGEX-2 BamRI mcgl (337) 

Sj26 ... gga tec GCA GCC CAC CCC GCG CCG GCG GCC ATG 
Gly Ser Ala Ala His Pro Ala Pro Ala Ala Met 

additional amino acids 



EXAMPLE 16 



Overnight bacterial cultures containing the pGEX plasmid were used to inoculate 500ml of Luria 
Broth media containing 50/zg/ml ampicillin. The cultures were grown to an OD of -0.8 and then 

10 induced with ImM of IPTG for up to 3 hours at 37°C. The bacteria were pelleted and 
resuspended in 15 ml of STE buffer (lOmM Tris pH 8.0, 150 mM NaCl and ImM EDTA) with 
1 mg/ml lysozyme. The mixture was left on ice for more than 1 hour and subsequent steps were 
performed at 4°C. Protease inhibitors aprotinin, pepstatin and leupeptin were added at final 
concentrations of 25/zg/ml, prior to the addition of Triton-X-100 (2% v/v final) and n-lauroyl 

15 sarcosine (1.5% w/v final). The lysate was sonicated for -1 minute and pelleted at 14,000 x g 
for 15 minutes. 100 /xl of 50% w/v glutathione-sephadex bead slurry (in PBS) was added per 
ml of supernatant. Following a 30 minute incubation at 4°C, the beads were washed three times 
with NETN (20mM Tris-HCl pH 8.0, lOOmM NaCl, ImM EDTA, 0.5% NP40), once with 
NETN-HS (equivalent to NETN but with 1M NaCl), and once in NETN. The bound protein 

20 was directly analysed by SDS-polyacrylamide gel electrophoresis (PAGE) as described below 
or the bound protein was eluted from the beads with the following elution buffer (50mM Tris pH 
8.0, 150mM NaCl, 5mM MgCl 2 , ImM DTT, lOmM reduced glutathione) for use in GDP release 
assays. 



25 



EXAMPLE 17 



Twenty microlitres of GST-sepharose-bound MCG7 were added to an equal volume of 2 x 
30 sample loading dye (lOOmM Tris pH6.8, 2% v/v mercaptoethanol, 4% w/v SDS, 0.2% w/v 
bromophenol blue, 20% v/v glycerol), boiled for 5 min and loaded onto a 7.5% w/v SDS-PAGE 
gel (Sambrook et al, 1989). The Coomassie brilliant blue stained gel (Sambrook et al f 1989) 
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typically displayed a protein doublet, running between 87-95 kDa consisting of the MCG7-GST 
fusion and a slightly smaller, co-purified contaminating E. coli protein of ~105kDa. The 
calculated molecular weight of full-length MCG7 is 77.5 kDa (Construct (D)) and the GST 
component has a molecular weight of 26kDa, hence, the recombinant protein runs slightly 
5 smaller than predicted. A Western blot of the same gel probed with anti-GST antibody yields 
an MCG7 -specific band at the same position as that of the stained gel. 

EXAMPLE 18 

10 Assumptions: (a) GST-Ras molecular weight = 50 kD; (b) Concentration of GST-Ras solution 
= lmg/ml = 20//M; (c) [ 3 H]-GDP is lmCi/ml and 13.3Ci/mmol, thereforb [ HJ-GDP 
concentration = 75>M and lpmol [ 3 H]-GDP= 15,466 cpm; (d) Elution buffer = Buffer E = 20 
mM Tris-Cl, pH7.5; 50mM NaCl; 5mM MgCl 2 ; ImM DTT (added just before use). Buffer E 
+ BSA= Buffer E+lmg/ml BSA (added just before use). 

15 

Mix together, in the following order and mix well after each addition: 
10//1 (=10//g) GST-Ras (@ lmg/ml in Buffer E), 463/zl Buffer E + BSA, 7/zl [ 3 H]-GDP, 10ml 
490 fxM EDTA. Incubate @ RT for 10 min. Add 10//1 0.5 M MgCl 2 and mix well. Incubate 
@ RT for 10 min. Place on ice. During the first incubation the excess EDTA concentration is 
20 5mM, during the second incubation the excess Mg concentration is 5mM. The [ 3 H]-GDP 
concentration is l^M and the final concentration of GST-Ras is 400nM. Thus 20ml of the final 
mix will contain 8pmol of GST-Ras protein. Specific activity of GDP is 15,446 cpm/pmol x 
(1/1.4)= 11,047 cpm/pmol. 

25 EXAMPLE 19 

Exchange Ras with labelled GDP as above. Add unlabelled GTP (stock = lOOmM, pH7) to 1 
mM. Adjust Mg concentration by adding S/A 0.5 EDTA to labelled Ras, 5//1 0.5M EDTA to 
500//1 MCG7, and 5//1 0.5M EDTA to 500/il Buffer E + BSA. On ice set up microfuge tubes 
30 with 40M Ras-GDP (in triplicate) with 40/zl MCG7 or Buffer E + BSA (control). Transfer tubes 
to heat block @ 25°C and incubate for 10, 20 or 30 min. Stop exchange reactions with 1ml of 
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ice cold buffer E and place on ice. Pre-soak nitrocellulose filters, pore size 45/zm, in Buffer E. 
Assemble the vacuum manifold apparatus (Millipore) with wet filters and plug the wells with 
rubber bunds. Switch on the vacuum pump. Remove the first plug, aliquot the sample and once 
it has been sucked through, wash the filter with 10ml of ice cold Buffer E. Remove next plug 
5 etc and continue round the manifold. Take manifold apart. Pin the filters to a pin board reserved 
for [ 3 H]. Air dry. Take up in 4ml scintillation fluid and count. These studies have been carried 
out with a truncated MCG7-GST fusion protein (amino acids 341 of Figure 13a to stop encoded 
within construct B). 

10 EXAMPLE 20 

A human gene was identified from chromosome 1 lql3 that encodes a new member of the DnaJ 
family of proteins (designated MCG18). This gene (meg 18) is expressed as an ~1.4kb mRNA 
(Fig. 28) and is predicted to encode a 241 amino acid product (Fig. 19). 

15 

EXAMPLE 21 

MCG18 has partial homology to E. coli dnaJ and other human DnaJ family members in that it 
contains the J domain (Fig. 20). 

20 

EXAMPLE 22 

MCG18 has greatest homology to functionally undefined proteins from C. elegans (Fig. 21) and 
S. pombe (Fig. 22) that also feature the J domain but maintain sequence similarity through the 
25 central and C-terminal regions of the proteins. 

EXAMPLE 23 

The J domain is proposed to mediate interaction with heat shock protein (Hsp70) 70 and consist 
30 of some 70 amino acids, frequently located at the N-terminus of the protein. One of these 
proteins, tumorous imaginal discs (Tid58) from Drosophila virilis (Fig. 23) functions as a 
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tumour suppressor. 

EXAMPLE 24 

5 A comparison of homology between MCG18 and human DnaJ proteins HDJ-2/H5DJ, HDJ- 
1/HSP40 and HSJ1 is shown in Fig. 24. 

EXAMPLE 25 

10 During the sequence characterisation of the VRF/VEGFB promoter region on cosmid CLGW4 
[Grimmond etal, 1996], which maps to chromosome 1 lql 3 the inventors identified a sequence 
that exactly matched numerous human and mouse expressed sequence tags (ESTs) in the EST 
database from a gene which we designated meg 18. EST clones for human (GenBank accession 
number T69741, clone 108172; accession number H40901, clone 177008) and mouse mcgl8 

15 (accession number W34884, clone 350966; accession number W64183, clone 385535) were 
obtained from Genome Systems Inc. and sequenced with the gene-specific primers shown in 
Table 7. The EST clones listed in Table 8 were also utilised in generating the full-length coding 
sequence for human (Figure 19) and mouse (Figure 25) mcgl8. The EST database also 
contained meg 18 cDNA entries that were alternately (or partially) spliced, and in order to 

20 understand their ability to encode new polypeptides, the gene structure of mcgl8 was determined 
by sequencing human and mouse genomic templates with gene-specific primers. 

Genomic fragments containing the human [Grimmond et al, 1996] and murine genes [Townson 
et al, 1996] have been previously reported. Cosmid CLGW4 contains the entire human gene 

25 and A 121 contains the entire mouse gene, as determined by direct sequencing of the templates 
with the oligonucleotides listed in Table 7. Plasmids containing sub-fragments of /ll 21 and 
cosmid CLGW4 were prepared using plasmid purification kits (Qiagen) and sequenced as 
described previously [Grimmond et al, 1996; Townson et al, 1996] using primers designed 
against cDNA and genomic sequences. The BLAST suite of programs [Altschul et al, 1990] 

30 was used to compare the sequence data against the nucleotide and protein databases at the 
National Center for Biotechnology Information (http//www.ncbi.nih.gov.nlm). The sequence 
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data were compiled using Mac Vector 4.2.1 software (IBI-Kodak). ClustalW sequence 
alignments [Thompson et al 1994] were conducted using the Australian National Genome 
Information Service computer faculty at the University of Sydney, Australia. 

5 The cDNA sequence of human mcgl8 (Figure 19) was translated in all possible reading frames 
and compared to the GenBank non-redundant protein database using the program BLASTX 
[Altschul et al 1990] and the coding region was identified on the basis of showing homology to 
the DnaJ family of proteins (Figure 20). The DnaJ domain is encoded within the longest open 
reading frame and the assigned initiation codon is preceded by an in-frame stop codon (Figure 

10 27). Similar database search results were obtained for the mouse meg 18 cDNA, and the 
alignment of human and mouse protein sequences is shown in Figure 26. MCG18 has greatest 
homology to gene products from C. elegans (Figure 21) and S. pombe (Figure 22). Although 
it shares a similar J-domain, MCG18 does not contain other domains described for the tumour 
suppressor gene from £>. virilis (Figure 23), nor is it a homologue of other reported human J- 

15 domain-containing proteins (Figure 24). 

To determine the expression pattern of mcgl8, 15//g of total cellular RNA (RNeasy Mini Kit, 
Qiagen) from various human cell lines grown in culture were electrophoresed through 1.2% 
MOPS/formaldehyde gels and blotted onto nylon membranes (Amersham) by capillary transfer 
20 using 20 x SSC (Sambrook et al, 1986). Filters were subsequently UV-fixed and hybridised 
overnight at 65°C to a radiolabeled ( 32 P-dCTP) cDNA probe (Church and Gilbert, 1984) for 
mcgl8. After washes in 0.1 x SSC/0.1% w/v SDS for 65°C for 1 hour, the filters were air-dried 
and exposed to X-ray film. This Northern analysis showed that mcg!8 is expressed as a 1.4kb 
message in numerous tissues including breast, ovary, bladder, lung and keratinocytes (Figure 28). 
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TABLE 4 



ESTs matching mcg4 



accession number 
gb|AA399110|AA399110 
gb|N39612|N39612 
gb|AA514406|AA5l4406 
gb|AA54494 6| AA5 4494 6 
gb | AA4 5007 6 j AA4 50076 
gb|AA53573l|AA535731 
gb|W79710|W79710 
gb | AA503S3 1 | AA503531 
gb|AA45013 2| AA450132 
gb|AA398068|AA3 98068 
gb|W604O5|W60405 
gb|W81382|W81382 
gb|AA047617|AA047617 
gb|AA282175|AA282i75 
gb|AA242159|AA242159 
gb|AA06868O|AA068680 
gb|W46766|W46766 
gb|N93704|N93704 
gb|AA155210 | AA155210 
gb|AA366022 (AA366022 
gb |AA037691 |AA037691 
gb|W35374|W35374 
dbj |C00696|C00696 
gb|T98249|T98249 
gb|W21588|W21588 
gb|H32171 (H32171 
gb|AA108092 (AA108O92 
gb|AA017857 (AA017857 
gb|AA037690 |AA037690 
gb|AA531006 |AAS31006 



seq . run organism 

zt89e06.sl Soares testis NHT Homo sa. 
yy51g06.sl Homo sapiens cDNA clone 2. 
nf57d01.sl NCIj:GAP_Co3 Homo sapiens. 
vk38e02.rl Soares mouse mammary glan. 
zx42a04.sl Soares total fetus Nb2HF8. 
nf88£07.sl NCI_CGAP_Co3 Homo sapiens. 
zd86f01.rl Soares fetal heart NbHH19. 
ne47e08.sl NCI_CGAP_Co3 Homo sapiens. 
Soares total 



zx42a04 .rl 
zt89f 06. rl Soares 
zd29h08.rl Soares 
zd86f01.sl Soares 
z £13 f 07. si Soares 



fetus Nb2HF8. 
testis NHT Homo sa. 
fetal heart NbHH19. 
fetal heart NbHH19. 
fetal heart NbHH19 . 



gb|N46760 
gb|W23584 
gb|W42214 



N46760 
W23584 
W42214 



gb|AA244877 |AA244877 
gb|W32939|W32939 



zt02d03.sl NCI_CCAP_GCB1 Homo sapien. 
rm/30d04.rl Barstead mouse pooled org. 
mm61a05.rl Stratagene mouse embryoni. 
zc36b07.sl Soares senescent fibrobla. 
zb51c04.sl Soares fetal lung NbKL19W. 
mr98e01.rl Stratagene mouse embryoni. 
EST76915 Pineal gland II Homo sapien. 
zk34hl2.sl Soares pregnant uterus Nb. 
zc07h03.sl Soares parathyroid tumor . 
HUMGS0008251, Human Gene Signature, . 
ye59a07.sl Homo sapiens cDNA clone 1. 
zb51c04.rl Soares fetal lung NbHLl9W. 
EST107015 Rattus sp. cDNA 5* end. 
mm89e06.rl Stratagene mouse embryoni. 
mh44dl0.rl Soares mouse placenta 4Nb. 
zk34hl2.rl Soares pregnant uterus Nb. 
nj07bll.sl NCI_CGAP_Pr22 Homo sapien. 
yy51g0G.rl Homo sapiens cDNA clone 2. 
zc71d03.sl Soares fetal heart NbHH19. . 
mc69h09.rl Soares mouse embryo NbMEl.. 
mx25a04.rl Soares mouse NML Mus muse. . 
zc07h03.rl Soares parathyroid tumor 



score E value 


N 


1136 


4.0e-168 


2 


1521 


5.3e-168 


4 


931 


5. 5e-166 


3 


1207 


8. 4e-164 


2 


691 


2.3e-160 


4 


796 


3.5e-158 


4 


1644 


l.le-157 


4 


736 


4.0e-156 


4 


1955 


3.9e-lS5 


1 


1315 


5.4e-148 


2 


1022 


1.8e-139 


4 


605 


3.5e-125 


5 


922 


4.6e-125 


2 


1577 


2.0e-123 


1 


866 


7.7e-117 


2 


1280 


1.6e-98 


1 


506 


9.6e-92 


3 


584 


9.0e-91 


4 


840 


7.6e-87 


2 


1077 


2.4e-81 


1 


949 


2.1e-80 


2 


1016 


3.1e-76 


1 


1009 


1.2e-75 


1 


998 


6.7e-75 


1 


484 


l.le-69 


4 


828 


l.le-60 


1 


782 


1.3e-60 


2 


665 


2.5e-60 


2 


540 


9.4e-53 


2 


535 


5.4e-48 


2 


665 


9.5e-47 


1 


457 


1.8e-44 


2 


460 


1.3e-38 


3 


429 


2.9e-25 


1 


320 


4.8e-18 


1 
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TABLE5 

ESTs matching AA074703 («c^4-related cDNA) 



Database: Non-redundant Database of GenBank EST Division 
1,222,625 sequences; 449,352,662 total letters. 

Smallest 
Sum 









High 


Probability 


Sequences producing i 


High-scoring 


Segment Pairs: 


Score 


P 


(N) 




N 


accession number 


seq. run 


organism 


score 


; E 


value 


N 


gb|AA074703 | AA074703 


zm76g07 .rl 


Stratagene neuroepitheli. . . 


2071 


4 


-0e 


-167 


1 


gb|AA068680 (AAO686B0 


mm61a05 .rl 


Stratagene mouse embryon. . . 


1270 




.4e 


-145 


4 


gb|AA1347 88|AA134788 


zmBlg02.rl 


Stratagene neuroepitheli. . - 


946 


1 


-3e 


-144 


5 


gb|AA399110|AA399110 


2t89e06.sl 


Soares testis NHT Homo s. . . 


520 


8 


,7e« 


-119 


6 


gb|N39612 |N39612 


yy51g06.sl 


Homo sapiens cDNA clone . . . 


582 


9, 


.6e- 


-110 


7 


gb|AA282175|AA282175 


zt02d03.sl 


NCI__CGAP_GCB1 Homo sapie. . . 


771 


9. 


4e- 


•80 


3 


gb|W81382 |W81382 


zd86f01.sl 


Soares fetal heart NbHHl... 


329 


1. 


6e~ 


■75 


6 


gb|AA544946|AA544946 


vk38e02.rl 


Soares mouse manmary gla. . . 


644 


9. 


6e- 


63 


2 


gb|W35374|W35374 


zc07h03.sl 


Soares parathyroid tumor. . . 


294 


4. 


5e- 


42 


4 


gb|W57106|W57106 


md57cI2.rl 


Soares mouse embryo NbME. . . 


394 


1. 


9e- 


30 


2 


gb|AA244877 (AA244877 


mx25a04 .rl 


Soares mouse NML Mus mus. . . 


162 


2. 


le- 


27 


4 


gb|AA017857|AA017857 


mh4 4dl0.rl 


Soares mouse placenta 4N. . . 


230 


3. 


7e- 


23 


3 


gb|AA5310O6|AA5310O6 


nj07bll.sl 


NCI_CGAP_Pr22 Homo sapie. . . 


139 


2. 


3e- 


19 


3 


gb|H32171|H32171 


EST107015 Rattus sp. cDNA 5* end. 


207 


2. 


6e- 


10 


2 


gb|W79710|W79710 


zd86f01.rl 


Soares fetal heart NbHHl... 


157 


0. 


0073 


1 
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TABLE 6 
meg 7-specific oligonucleotides 



name 


sequence (5' to 3') 


SEQ ID NOs. 


M1044R 


GGA CAA AGT GTG TGA TGA ACC 


SEQ ID NO:25 


MCG7-GEF-REV2 


CTC ATC CTC CGT CTG ATA CTG 


SEQ ID NO:26 


M7R 


GTA GAT GTG GAT CAG CTT GG 


SEQ ID NO:27 


MCG7 CA FOR 


AGG TGG AGA ATG GTC AAGG 


SEQ ID NO:28 


MCG7-GEF-REV 


GTC ATA GTC TGT CTC CTA CT 


SEQ ID NO:29 


MCG7 GEF FOR 


ACA TAG ACA GCG TGC CTA CC 


SEQ ID NO:30 


MCG7-PKC-REV 


TAC AAC CTT AGG GAC ACC AG 


SEQIDNO:31 


MCG7-PKC-FOR 


TGC TGA GCC TGC TCA CGG TG 


SEQ ID NO:32 


T09103F 


CAA GTG AAC AGC ACG TCC 


SEQ ID NO:33 


M7F 


GAC TAT CTC AAG GAC CAG CTG 


SEQ ID NO:34 


MCG7UF 


GGT TCG GTC CGA GCC CGG 


SEQ ID NO:35 


SGCADRV2 


GGA GCG ATA CTC CAA GTA GGT 


SEQIDNO:36 
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TABLE 7 

meg /S-SPECIFIC OLIGONUCLEOTIDES 



name 


sequence 5' to 3' 


HVESTF 


AGC GGG CCA GGC CCC TTC [SEQ ID NO:37] 


HV195F 


CAT CCT GGT CCA ATG CGC TC [SEQ ID NO:38] 


HV387F2 


GCA CTG AGG AAG TTA AAC GAG C [SEQ ID NO:39] 


HV408R 


GCT CGT TTA ACT TCC TCA GTG C [SEQ ID NO:40] 


EXON1REV 


GCT CAG CTC CAC AAA GCG GCT [SEQ ID NO:41] 


HVEST426F 


ACC AGC TCC GCT CAG GTA G [SEQ ID NO:42] 


HVEST623R 


TCC AGG AGC TGT GTG TTT GG [SEQ ID NO:43] 


SGVESTF3 


CCA GTT TCA CAG CGT GAG G [SEQ ID NO:44] 


HVEST631R 


CAG CAT GAG GAG GAG GCA G [SEQ ID NO:45] 
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TABLE 8 

EST CLONE SEQUENCES USED TO GENERATE HUMAN AND MOUSE 
mcgl8 cDNA SEQUENCE COMPOSITES 



EST clone numh«* r 


oreanism 


GenBank accession wmfcr 


lg2815 


human 


D45683 


001-T2-18 


human 


F 17225 


273748 


human 


N37043 


177008 


human 


H40901 and H40939 


25801 1 


human 


N30776 


276887 • 


human 


N44004 


108172 


human 


T69741 


307529 


human 


W21083 andW32579 


342027 


human 


W60283 


354288 


mouse 


W44038 


350966 


mouse 


W348844 


426261 


mouse 


AA002868 


368185 


mouse 


W539U 


385535 


mouse 


W64183 


404472 


mouse 


W82959 


406437 


mouse 


W83482 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: (OTHER THAN US): The Council of The Queensland Institute of 

Medical Research 

(US ONLY): HAYWARD Nicholas, SILINS Ginters, GRIMMOND Sean, 
GARTSIDE Michael and HANCOCK, John 

(ii) TITLE OF INVENTIONS NOVEL GENE AND USES THEREFOR 

(iii) NUMBER OF SEQUENCES: 45 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: DAVIES COLLISON CAVE 

(B) STREET: 1 LITTLE COLLINS STREET 

(C) CITY: MELBOURNE 

(D) STATE: VICTORIA 

(E) COUNTRY: AUSTRALIA 

(F) ZIP: 3000 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 
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(2) INFORMATION FOR SEQ ID NO:l: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Cys Xaa Xaa Cys Xaa Gly Xaa Gly 
5 



(2) INFORMATION FOR. SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 30 .. 959 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 



TCAGTAAACA CAGAGACTGG GGATCGATC ATG GGG CTT TGT AAG TGC CCC AAG 53 

Met Gly Leu Cys Lys Cys Pro Lys 
1 5 

AGA AAG GTG ACC AAC CTG TTC TGC TTC GAA CAT CGG GTC AAC GTC TGC 101 

Arg Lys Val Thr Asn Leu Phe Cys Phe Glu His Arg Val Asn Val Cys 
10 15 20 

GAG CAC TGC CTG GTA GCC AAT CAC GCC AAG TGC ATC GTC CAG TCC TAC 149 

Glu His Cys Leu Val Ala Asn His Ala Lys Cys He Val Gin Ser Tyr 

25 30 35 40 

CTG CAA TGG CTC CAA GAT AGC GAC TAC AAC CCC AAT TGC CGC CTG TGC 197 

Leu Gin Trp Leu Gin Asp Ser Asp Tyr Asn Pro Asn Cys Arg Leu Cys 

45 50 55 

AAC ATA CCC CTG GCC AGC CGA GAG ACG ACC CGC CTT GTC TGC TAT GAT 245 

Asn He Pro Leu Ala Ser Arg Glu Thr Thr Arg Leu Val Cys Tyr Asp 
60 65 70 

CTC TTT CAC TGG GCC TGC CTC AAT GAA CGT GCT GCC CAG CTA CCC CGA 293 

Leu Phe His Trp Ala Cys Leu Asn Glu Arg Ala Ala Gin Leu Pro Arg 
75 80 85 

AAC ACG GCA CCT GCC GGC TAT CAG TGC CCC AGC TGC AAT GGC CCC ATC 341 

Asn Thr Ala Pro Ala Gly Tyr Gin Cys Pro Ser Cys Asn Gly Pro He 
90 95 100 

TTC CCC CCA ACC AAC CTG GCT GGC CCC GTG GCC TCC GCA CTG AGA GAG 389 

Phe Pro Pro Thr Asn Leu Ala Gly Pro Val Ala Ser Ala Leu Arg Glu 

105 HO 115 120 
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AAG 
Lys 


CTG 
Leu 


GCC 
Ala 


ACA 
Thr 


GTC 
Val 
125 


AAC 
Asn 


TGG 
Trp 


GCC 
Ala 


CGG 
Arg 


GCA 
Ala 

J- J u 


GGA 
Gly 


CTG 
Leu 


GGC 
Gly 


CTC 
Leu 


CCT 
Pro 


CTG 
Leu 


437 


ATC 
He 


GAT 
Asp 


GAG 
Glu 


GTG 
Val 
140 


GTG 
Val 


AGC 
Ser 


CCA 
Pro 


GAG 
Glu 


CCC 
Pro 

1 / c 
±*± j 


GAG 
Glu 


CCC 
Pro 


CTC 
Leu 


AAC 
Asn 


ACG 
Thr 


TCT 
Ser 


GAC 
Asp 


485 


TTC 
Phe 


TCT 
Ser 


GAC TGG TCT 
Asp Trp Ser 
155 


AGT 
Ser 


TTT 
Phe 


AAT 
Asn 
160 


GCC 
Ala 


AGC 
Ser 


AGT 
Ser 


ACC 
Thr 


CCT 
Pro 

ID J 


GGA 
Gly 


CCA 
Pro 


GAG 
Glu 


533 


GAG 
Glu 


GTA 
Val 
170 


GAC 
Asp 


AGC 
Ser 


GCC 
Ala 


TCT 
Ser 


GCT 
Ala 
175 


GCC 
Ala 


CCA 
Pro 


GCC 
Ala 


TTC 
Phe 


TAC 
Tyr 
1 fin 


AGC 
Ser 


CGA 
Arg 


GCC 
Ala 


CCC 
Pro 


581 


CGG 
Arg 
185 


CCC 
Pro 


CCA 
Pro 


GCT 
Ala 


TCC 
Ser 


CCA GGC CGG 
Pro Gly Arg 
190 


CCC 
Pro 


GAG 
Glu 


CAG 
Gin 

J. Z7 J 


CAC 
His 


ACA 
Thr 


GTG 
Val 


ATC 
He 


CAC 
His 

z u u 


629 


ATG 
Met 


GGC 
Gly 


AAT 

Asn 


CCT 
Pro 


GAG 
Glu 
205 


CCC 
Pro 


TTG 
Leu 


ACT 
Thr 


CAC 
His 


GCC 
Ala 
91 ft 

Z X u 


CCT 
Pro 


AGG 
Arg 


AAG 
Lys 


GTG 
Val 


TAT 
Tyr 

4 JLD 


GAT 
Asp 


677 


ACG 
Thr 


CGG 
Arg 


GAT GAT GAC CGG 
Asp Asp Asp Arg 
220 


ACA 
Thr 


CCA 
Pro 


GGC 
Gly 


CTC 
Leu 


CAT 
His 


GGA 
Gly 


GAC 
Asp 


TGT 
Cys 

0 *\ ft 
Z j u 


GAC 
Asp 


GAT 
Asp 


725 


GAC 
Asp 


AAG 
Lys 


TAC 
Tyr 
235 


CGA 
Arg 


CGT 
Arg 


CGG 
Arg 


CCG 
Pro 


GCC 
Ala 
240 


TTG 
Leu 


GGT 
Gly 


TGG 
Trp 


CTG 
Leu 


GCC 
Ala 

z»j 


CGG 
Arg 


CTG 
Leu 


CTA 
Leu 


773 


AGG 
Arg 


AGC 
Ser 
250 


CGG GCT 
Arg Ala 


GGG 
Gly 


TCT 
Ser 


CGG 
Arg 
255 


AAG 
Lys 


CGG 
Arg 


CCG 
Pro 


CTG 
Leu 


ACC 
Thr 
260 


CTG 
Leu 


CTC 
Leu 


CAG 
Gin 


CGG 
Arg 


821 


GCG 
Ala 
265 


GGG 
Gly 


CTG 
Leu 


CTG 
Leu 


CTA 
Leu 


CTC 
Leu 
270 


TTG GGA 
Leu Gly 


CTG 
Leu 


CTG 
Leu 


GGC 
Gly 
275 


TTC 
Phe 


CTG 
Leu 


GCC 
Ala 


CTC 
Leu 


CTT 
Leu 
280 


ooy 


GCC 
Ala 


CTC 
Leu 


ATG 
Met 


TCT 
Ser 


CGC 
Arg 
285 


CTA GGC CGG 
Leu Gly Arg 


GCC 
Ala 


GCA 
Ala 
290 


GCT 
Ala 


GAC 
Asp 


AGC 
Ser 


GAT 
Asp 


CCC 
Pro 
295 


AAC 
Asn 


Q 1 7 

j A. I 


CTG 
Leu 


GAC 
Asp 


CCA 
Pro 


CTC 
Leu 
300 


ATG 
Met 


AAC 
Asn 


CCT 
Pro 


CAC 
His 


ATC 
He 
305 


CGC 
Arg 


GTG 
Val 


GGC 
Gly 


CCC 
Pro 


TCC 
Ser 
310 


TGA 
* 




962 


GCCCCCTTGC ' 


PTGTGGCTAG GCCAGCCTAG GATGTGGGTT 


CTGTGGAGGA GAGGCGGGGT 


1022 


AATGGGGAGG I 


CTGAGGGCAC CTCTTCACTG CCCCTCTCCC 


TCAAGCCTAA GACACTAAGA 


1082 


CCCCAGACCC AAAGCCAAGT CCACCAGAGT GGCTCGCAGG 


CCAGGCCTGG AGTCCCCGTG 


1142 


GGTCAAGCAT 1 


TTGTCTTGAC TTGCTTTCTC CCGGGTCTCC 


AGCCTCCGAC CCCTCGCCCC 


1202 


ATGAAGGAGC 1 


TGGCAGGTGG AAATAAACAA CAACTTTATT 












1242 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 310 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

Met Gly Leu Cys Lys Cys Pro Lys Arg Lys Val Thr Asn Leu Phe Cys 
1 5 10 15 

Phe Glu His Arg Val Asn Val Cys Glu His Cys Leu Val Ala Asn His 
20 25 30 

Ala Lys Cys lie Val Gin Ser Tyr Leu Gin Trp Leu Gin Asp Ser Asp 
35 40 45 

Tyr Asn Pro Asn Cys Arg Leu Cys Asn lie Pro Leu Ala Ser Arg Glu 
50 55 60 

Thr Thr Arg Leu Val Cys Tyr Asp Leu Phe His Trp Ala Cys Leu Asn 
65 70 75 80 

Glu Arg Ala Ala Gin Leu Pro Arg Asn Thr Ala Pro Ala Gly Tyr Gin 
85 90 95 

Cys Pro Ser Cys Asn Gly Pro He Phe Pro Pro Thr Asn Leu Ala Gly 
100 105 HO 

Pro Val Ala Ser Ala Leu Arg Glu Lys Leu Ala Thr Val Asn Trp Ala 
115 120 125 

Arg Ala Gly Leu Gly Leu Pro Leu He Asp Glu Val Val Ser Pro Glu 
130 135 140 

Pro Glu Pro Leu Asn Thr Ser Asp Phe Ser Asp Trp Ser Ser Phe Asn 
145 150 155 160 

Ala Ser Ser Thr Pro Gly Pro Glu Glu Val Asp Ser Ala Ser Ala Ala 
165 170 175 

Pro Ala Phe Tyr Ser Arg Ala Pro Arg Pro Pro Ala Ser Pro Gly Arg 
180 185 190 

Pro Glu Gin His Thr Val He His Met Gly Asn Pro Glu Pro Leu Thr 
195 200 205 

His Ala Pro Arg Lys Val Tyr Asp Thr Arg Asp Asp Asp Arg Thr Pro 
210 215 220 

Gly Leu His Gly Asp Cys Asp Asp Asp Lys Tyr Arg Arg Arg Pro Ala 
225 230 235 240 

Leu Gly Trp Leu Ala Arg Leu Leu Arg Ser Arg Ala Gly Ser Arg Lys 
245 250 255 

Arg Pro Leu Thr Leu Leu Gin Arg Ala Gly Leu Leu Leu Leu Leu Gly 
260 265 270 

Leu Leu Gly Phe Leu Ala Leu Leu Ala Leu Met Ser Arg Leu Gly Arg 
275 280 285 

Ala Ala Ala Asp Ser Asp Pro Asn Leu Asp Pro Leu Met Asn Pro His 
290 295 300 

He Arg Val Gly Pro Ser 
305 310 



(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 2415 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 3 . .2188 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

CG ATT TCA TTC CTC GCT CCC CAC AGG TCC CTC TCC CCA AAA TAT TCC 47 
lie Ser Phe Leu Ala Pro His Arg Ser Leu Ser Pro Lys Tyr Ser 
15 10 15 

CAT CTT GTC CTA GCC CAT CCC CCA GAC TAT CTC AAG GAC CAG CTG TCC 9 5 

His Leu Val Leu Ala His Pro Pro Asp Tyr Leu Lys Asp Gin Leu Ser 
20 25 30 

CCA CGC CCC CGA CCT CCA CTA GGC CTG TGC CAC CCG CTG CCT GCA GGA 143 
Pro Arg Pro Arg Pro Pro Leu Gly Leu Cys His Pro Leu Pro Ala Gly 
35 40 45 

AGA CGC CCG GTC CCG GGC CGG GTT AGC CCC ATG GGA ACG CAG CGC CTG 191 
Arg Arg Pro Val Pro Gly Arg Val Ser Pro Met Gly Thr Gin Arg Leu 
50 55 60 

TGT GGC CGC GGG ACT CAA GGC TGG CCT GGC TCA AGT GAA CAG CAC GTC 239 
Cys Gly Arg Gly Thr Gin Gly Trp Pro Gly Ser Ser Glu Gin His Val 
65 70 75 

CAG GAG GCG ACC TCG TCC GCG GGT TTG CAT TCT GGG GTG GAC GAG CTG 287 
Gin Glu Ala Thr Ser Ser Ala Gly Leu His Ser Gly Val Asp Glu Leu 
80 85 90 95 

GGG GTT CGG TCC GAG CCC GGT GGG AGG CTC CCG GAG CGC AGC CTG GGC 335 
Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser Leu Gly 
100 105 110 

CCA GCC CAC CCC GCG CCG GCG GCC ATG GCA GGC ACC CTG GAC CTG GAC 383 
Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp Leu Asp 
115 120 125 

AAG GGC TGC ACG GTG GAG GAG CTG CTC CGC GGG TGC ATC GAA GCC TTC 431 
Lys Gly Cys Thr Val Glu Glu Leu Leu Arg Gly Cys lie Glu Ala Phe 
130 135 140 

GAT GAC TCC GGG AAG GTG CGG GAC CCG CAG CTG GTG CGC ATG TTC CTC 479 
Asp Asp Ser Gly Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu 
145 150 155 

ATG ATG CAC CCC TGG TAC ATC CCC TCC TCT CAG CTG GCG GCC AAG CTG 527 
Met Met His Pro Trp Tyr lie Pro Ser Ser Gin Leu Ala Ala Lys Leu 
160 165 170 175 

CTC CAC ATC TAC CAA CAA TCC CGG AAG GAC AAC TCC AAT TCC CTG CAG 575 
Leu His lie Tyr Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin 
180 185 190 

GTG AAA ACG TGC CAC CTG GTC AGG TAC TGG ATC TCC GCC TTC CCA GCG 623 
Val Lys Thr Cys His Leu Val Arg Tyr Trp lie Ser Ala Phe Pro Ala 
195 200 205 
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GAG TTT GAC TTG AAC CCG GAG TTG GCT GAG CAG ATC AAG GAG CTG AAG 671 
Glu. Phe Asp Leu Asn Pro Glu Leu Ala Glu Gin lie Lys Glu Leu Lys 
210 215 220 

GCT CTG CTA GAC CAA GAA GGG AAC CGA CGG CAC AGC AGC CTA ATC GAC 719 
Ala Leu Leu Asp Gin Glu Gly Asn Arg Arg His Ser Ser Leu lie Asp 
225 230 235 

ATA GAC AGC GTC CCT ACC TAC AAG TGG AAG CGG CAG GTG ACT CAG CGG 767 
lie Asp Ser Val Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg 
240 245 250 255 

AAC CCT GTG GGA CAG AAA AAG CGC AAG ATG TCC CTG TTG TTT GAC CAC 815 
Asn Pro Val Gly Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His 
260 265 270 

CTG GAG CCC ATG GAG CTG GCG GAG CAT CTC ACC TAC TTG GAG TAT CGC 863 
Leu Glu Pro Met Glu Leu Ala Glu His Leu Thr tyr Leu Glu Tyr Arg 
275 280 285 

TCC TTC TGC AAG ATC CTG TTT CAG GAC TAT CAC AGT TTC GTG ACT CAT 911 
Ser Phe Cys Lys He Leu Phe Gin Asp Tyr His Ser Phe Val Thr His 
290 295 300 

GGC TGC ACT GTG GAC AAC CCC GTC CTG GAG CGG TTC ATC TCC CTC TTC 959 
Gly Cys Thr Val Asp Asn Pro Val Leu Glu Arg Phe He Ser Leu Phe 
305 310 315 

AAC AGC GTC TCA CAG TGG GTG CAG CTC ATG ATC CTC AGC AAA CCC ACA 1007 
Asn Ser Val Ser Gin Trp Val Gin Leu Met He Leu Ser Lys Pro Thr 
320 325 330 335 

GCC CCG CAG CGG GCC CTG GTC ATC ACA CAC TTT GTC CAC GTG GCG GAG 1055 
Ala Pro Gin Arg Ala Leu Val He Thr His Phe Val His Val Ala Glu 
340 345 350. 

AAG CTG CTA CAG CTG CAG AAC TTC AAC ACG CTG ATG GCA GTG GTC GGG 1103 
Lys Leu Leu Gin Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly 
355 360 365 

GGC CTG AGC CAC AGC TCC ATC TCC CGC CTC AAG GAG ACC CAC AGC CAC 1151 
Gly Leu Ser His Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His 
370 375 380 

GTT AGC CCT GAG ACC ATC AAG CTC TGG GAG GGT CTC ACG GAA CTA GTG 1199 
Val Ser Pro Glu Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val 
385 390 395 

ACG GCG ACA GGC AAC TAT GGC AAC TAC CGG CGT CGG CTG GCA GCC TGT 1247 
Thr Ala Thr Gly Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys 
400 405 410 415 

GTG GGC TTC CGC TTC CCG ATC CTG GGT GTG CAC CTC AAG GAC CTG GTG 1295 
Val Gly Phe Arg Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val 
420 425 430 

GCC CTG CAG CTG GCA CTG CCT GAC TGG CTG GAC CCA GCC CGG ACC CGG 1343 
Ala Leu Gin Leu Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg 
435 440 445 

CTC AAC GGG GCC AAG ATG AAG CAG CTC TTT AGC ATC CTG GAG GAG CTG 1391 
Leu Asn Gly Ala Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu 
450 455 460 

GCC ATG GTG ACC AGC CTG CGG CCA CCA GTA CAG GCC AAC CCC GAC CTG 1439 
Ala Met Val Thr Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu 
465 470 475 
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CTG AGC CTG CTC ACG GTG TCT CTG GAT CAG TAT CAG ACG GAG GAT GAG 1487 
Leu Ser Leu Leu Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu 
480 485 490 495 

CTG TAC CAG CTG TCC CTG CAG CGG GAG CCG CGC TCC AAG TCC TCG CCA 1535 
Leu Tyr Gin Leu Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro 
500 505 510 

ACC AGC CCC ACG AGT TGC ACC CCA CCA CCC CGG CCC CCG GTA CTG GAG 1583 
Thr Ser Pro Thr Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu 
515 520 525 

GAG TGG ACC TCG GCT GCC AAA CCC AAG CTG GAT CAG GCC CTC GTG GTG 1631 
Glu Trp Thr Ser Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val 
530 535 540 

GAG CAC ATC GAG AAG ATG GTG GAG TCT GTG TTC CGG AAC TTT GAC GTC 1679 
Glu His lie Glu Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val 
545 550 555 

GAT GGG GAT GGC CAC ATC TCA CAG GAA GAA TTC CAG ATC ATC CGT GGG 17 27 

Asp Gly Asp Gly His lie Ser Gin Glu Glu Phe Gin lie He Arg Gly 
560 565 570 575 

AAC TTC CCT TAC CTC AGC GCC TTT GGG GAC CTC GAC CAG AAC CAG GAT 1775 
Asn Phe Pro Tyr Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp 
580 585 590 

GGC TGC ATC AGC AGG GAG GAG ATG GTT TCC TAT TTC CTG CGC TCC AGC 1823 
Gly Cys He Ser Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser 
595 600 605 

TCT GTG TTG GGG GGG CGC ATG GGC TTC GTA CAC AAC TTC CAG GAG AGC 1871 
Ser Val Leu Gly Gly Arg Met Gly Phe Val His Asn. Phe Gin Glu Ser 
610 615 620 

AAC TCC TTG CGC CCC GTC GCC TGC CGC CAC TGC AAA GCC CTG ATC CTG 1919 
Asn Ser Leu Arg Pro Val Ala Cys Arg His Cys Lys Ala Leu He Leu 
625 630 635 

GGC ATC TAC AAG CAG GGC CTC AAA TGC CGA GCC TGT GGA GTG AAC TGC 1967 
Gly He Tyr Lys Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys 
640 645 650 655 

CAC AAG CAG TGC AAG GAT CGC CTG TCA GTT GAG TGT CGG CGC AGG GCC 2 015 

His Lys Gin Cys Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala 
660 665 670 

CAG AGT GTG AGC CTG GAG GGG TCT GCA CCC TCA CCC TCA CCC ATG CAC 2063 
Gin Ser Val Ser Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His 
675 680 685 

AGC CAC CAT CAC CGC GCC TTC AGC TTC TCT CTG CCC CGC CCT GGC AGG 2111 
Ser His His His Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg 
690 695 700 

CGA GGC TCC AGG CCT CCA GAG ATC CGT GAG GAG GAG GTA CAG ACG GTG 2159 
Arg Gly Ser Arg Pro Pro Glu He Arg Glu Glu Glu Val Gin Thr Val 
705 710 715 

GAG GAT GGG GTG TTT GAC ATC CAC TTG TA ATAGATGCTG TGGTTGGATC 2208 
Glu Asp Gly Val Phe Asp He His Leu 
720 725 

AAGGACTCAT TCCTGCCTTG GAGAAAATAC TTCAACCAGA GCAGGGAGCC TGGGGGTGTC 2268 
GGGGCAGGAG GCTGGGGATG GGGGTGGGAT ATGAGGGTGG CATGCAGCTG AGGGCAGGGC 2328 
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CAGGGCTGGT GTCCCTAAGG TTGTACAGAC TCTTGTGAAT ATTTGTATTT TCCAGATGGA 2388 
ATAAAAAGGC CCGTGTAATT AACCTTC 2415 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 728 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

lie Ser Phe Leu Ala Pro His Arg Ser Leu Ser Pro Lys Tyr Ser His 
1 5 10 15 

Leu Val Leu Ala His Pro Pro Asp Tyr Leu Lys Asp Gin Leu Ser Pro 
20 25 30 

Arg Pro Arg Pro Pro Leu Gly Leu Cys His Pro Leu Pro Ala Gly Arg 
35 40 45 

Arg Pro Val Pro Gly Arg Val Ser Pro Met Gly Thr Gin Arg Leu Cys 
50 55 60 . 

Gly Arg Gly Thr Gin Gly Trp Pro Gly Ser Ser Glu Gin His Val Gin 
65 70 75 80 

Glu Ala Thr Ser Ser Ala Gly Leu His Ser Gly Val Asp Glu Leu Gly 
85 90 95 

Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser Leu Gly Pro 
100 105 110 

Ala. His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp Leu Asp Lys 
115 120 125 

Gly Cys Thr Val Glu Glu Leu Leu Arg Gly Cys He Glu Ala Phe Asp 
130 135 140 

Asp Ser Gly Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu Met 
145 150 155 160 

Met His Pro Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu Leu 
165 170 175 

His He Tyr Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin Val 
180 185 190 

Lys Thr Cys His Leu Val Arg Tyr Trp He Ser Ala Phe Pro Ala Glu 
195 200 205 

Phe Asp Leu Asn Pro Glu Leu Ala Glu Gin He Lys Glu Leu Lys Ala 
210 215 220 

Leu Leu Asp Gin Glu Gly Asn Arg Arg His Ser Ser Leu He Asp He 
225 230 235 240 

Asp Ser Val Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg Asn 
245 250 255 

Pro Val Gly Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His Leu 
260 265 270 
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Glu Pro Met Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg Ser 
275 280 285 

Phe Cys Lys lie Leu Phe Gin Asp Tyr His Ser Phe Val Thr His Gly 
290 295 300 

Cys Thr Val Asp Asn Pro Val Leu Glu Arg Phe lie Ser Leu Phe Asn 
305 310 315 320 

Ser Val Ser Gin Trp Val Gin Leu Met He Leu Ser Lys Pro Thr Ala 
325 330 335 

Pro Gin Arg Ala Leu Val He Thr His Phe Val His Val Ala Glu Lys 
340 345 350 

Leu Leu Gin Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly Gly 
355 360 365 

Leu Ser His Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His Val 
370 375 380 

Ser Pro Glu Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val Thr 
385 390 395 400 

Ala Thr Gly Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys Val 
405 410 415 

Gly Phe Arg Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val Ala 
420 425 430 

Leu Gin Leu Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg Leu 
435 440 445 

Asn Gly Ala Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu Ala 
450 "* 455 460 

Met Val Thr Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu Leu 
465 470 475 480 

Ser Leu Leu Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu Leu 
485 490 495 

Tyr Gin Leu Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro Thr 
500 505 510 

Ser Pro Thr Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu Glu 
515 520 525 

Trp Thr Ser Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val Glu 
530 535 540 

His He Glu Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val Asp 
545 " 550 555 560 

Gly Asp Gly His He Ser Gin Glu Glu Phe Gin He lie Arg Gly Asn 
565 570 575 

Phe Pro Tyr Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp Gly 
580 585 590 

Cys He Ser Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser Ser 
595 600 605 

Val Leu Gly Gly Arg Met Gly Phe Val His Asn Phe Gin Glu Ser Asn 
610 ~ 615 620 

Ser Leu Arg Pro Val Ala Cys Arg His Cys Lys Ala Leu He Leu Gly 
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625 630 635 640 

lie Tyr Lys Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys His 
645 650 655 

Lys Gin Cys Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala Gin 
660 665 670 

Ser Val Ser Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His Ser 
675 680 685 

His His His Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg Arg 
690 695 700 

Gly Ser Arg Pro Pro Glu lie Arg Glu Glu Glu Val Gin Thr Val Glu 
705 710 715 720 

Asp Gly Val Phe Asp lie His Leu 
725 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2309 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 254. .2083 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 120 

TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCGGGTTAGC CCCATGGGAA 180 

CGGGGTTCGG TCCGAGCCCG GTGGGAGGCT CCCGGAGCGC AGCCTGGGCC CAGCCCACCC 240 

CGCGCCGGCG GCC ATG GCA GGC ACC CTG GAC CTG GAC AAG GGC TGC ACG 289 
Met Ala Gly Thr Leu Asp Leu Asp Lys Gly Cys Thr 
1 5 10 

GTG GAG GAG CTG CTC CGC GGG TGC ATC GAA GCC TTC GAT GAC TCC GGG 337 
Val Glu Glu Leu Leu Arg Gly Cys lie Glu Ala Phe Asp Asp Ser Gly 
15 20 25 

AAG GTG CGG GAC CCG CAG CTG GTG CGC ATG TTC CTC ATG ATG CAC CCC 385 
Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu Met Met His Pro 
30 35 40 

TGG TAC ATC CCC TCC TCT CAG CTG GCG GCC AAG CTG CTC CAC ATC TAC 433 
Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu Leu His He Tyr 
45 50 55 60 

CAA CAA TCC CGG AAG GAC AAC TCC AAT TCC CTG CAG GTG AAA ACG TGC 481 
Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin Val Lys Thr Cys 
65 70 75 



CAC CTG GTC AGG TAC TGG ATC TCC GCC TTC CCA GCG GAG TTT GAC TTG 



529 
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His Leu Val Arg Tyr Trp lie Ser Ala Phe Pro Ala Glu Phe Asp Leu 
80 85 90 

AAC CCG GAG TTG GCT GAG CAG ATC AAG GAG CTG AAG GCT CTG CTA GAC 577 
Asn Pro Glu Leu Ala Glu Gin lie Lys Glu Leu Lys Ala Leu Leu Asp 
95 100 105 

CAA GAA GGG AAC CGA CGG CAC AGC AGC CTA ATC GAC ATA GAC AGC GTC 625 
Gin Glu Gly Asn Arg Arg His Ser Ser Leu lie Asp lie Asp Ser Val 
110 115 120 

CCT ACC TAC AAG TGG AAG CGG CAG GTG ACT CAG CGG AAC CCT GTG GGA 673 
Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg Asn Pro Val Gly 
125 130 135 140 

CAG AAA AAG CGC AAG ATG TCC CTG TTG TTT GAC CAC CTG GAG CCC ATG 721 
Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His Leu Glu Pro Met 
145 150 155 

GAG CTG GCG GAG CAT CTC ACC TAC TTG GAG TAT CGC TCC TTC TGC AAG 769 
Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg Ser Phe Cys Lys 
160 165 170 

ATC CTG TTT CAG GAC TAT CAC AGT TTC GTG ACT CAT GGC TGC ACT GTG 817 
lie Leu Phe Gin Asp Tyr His Ser Phe Val Thr His Gly Cys Thr Val 
175 180 185 

GAC AAC CCC GTC CTG GAG CGG TTC ATC TCC CTC TTC AAC AGC GTC TCA 865 
Asp Asn Pro Val Leu Glu Arg Phe lie Ser Leu Phe Asn Ser Val Ser 
190 195 200 

CAG TGG GTG CAG CTC ATG ATC CTC AGC AAA CCC ACA GCC CCG CAG CGG 913 
Gin Trp Val Gin Leu Met lie Leu Ser Lys Pro Thr Ala Pro Gin Arg 
205 210 215 220 

GCC CTG GTC ATC ACA CAC TTT GTC CAC GTG GCG GAG AAG CTG CTA CAG 961 
Ala Leu Val He Thr His Phe Val His Val Ala Glu Lys Leu Leu Gin 
225 230 235 

CTG CAG AAC TTC AAC ACG CTG ATG GCA GTG GTC GGG GGC CTG AGC CAC 1009 
Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly Gly Leu Ser His 
240 245 250 

AGC TCC ATC TCC CGC CTC AAG GAG ACC CAC AGC CAC GTT AGC CCT GAG 1057 
Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His Val Ser Pro Glu 
255 260 265 

ACC ATC AAG CTC TGG GAG GGT CTC ACG GAA CTA GTG ACG GCG ACA GGC 1105 
Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val Thr Ala Thr Gly 
270 275 280 

AAC TAT GGC AAC TAC CGG CGT CGG CTG GCA GCC TGT GTG GGC TTC CGC 1153 
Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys Val Gly Phe Arg 
285 290 295 300 

TTC CCG ATC CTG GGT GTG CAC CTC AAG GAC CTG GTG GCC CTG CAG CTG 1201 
Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val Ala Leu Gin Leu 
305 310 315 

GCA CTG CCT GAC TGG CTG GAC CCA GCC CGG ACC CGG CTC AAC GGG GCC 1249 
Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg Leu Asn Gly Ala 
320 325 330 

AAG ATG AAG CAG CTC TTT AGC ATC CTG GAG GAG CTG GCC ATG GTG ACC 1297 
Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu Ala Met Val Thr 
335 340 345 
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AGC CTG CGG CCA CCA GTA CAG GCC AAC 
Ser Leu Arg Pro Pro Val Gin Ala Asn 
350 355 

ACG GTG TCT CTG GAT CAG TAT CAG ACG 
Thr Val Ser Leu Asp Gin Tyr Gin Thr 
365 370 

TCC CTG CAG CGG GAG CCG CGC TCC AAG 
Ser Leu Gin Arg Glu Pro Arg Ser Lys 
385 

AGT TGC ACC CCA CCA CCC CGG CCC CCG 
Ser Cys Thr Pro Pro Pro Arg Pro Pro 
400 405 



CCC GAC CTG CTG AGC CTG CTC 1345 
Pro Asp Leu Leu Ser Leu Leu 
360 

GAG GAT GAG CTG TAC CAG CTG 1393 
Glu Asp Glu Leu Tyr Gin Leu 
375 380 

TCC TCG CCA ACC AGC CCC ACG 1441 
Ser Ser Pro Thr Ser Pro Thr 
390 395 

GTA CTG GAG GAG TGG ACC TCG 1489 
Val Leu Glu Glu Trp Thr Ser 
410 



GCT GCC AAA CCC AAG CTG GAT CAG GCC CTC GTG GTG GAG CAC ATC GAG 1537 
Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val Glu His lie Glu 
415 420 425 

AAG ATG GTG GAG TCT GTG TTC CGG AAC TTT GAC GTC GAT GGG GAT GGC 1585 
Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val Asp Gly Asp Gly 
430 435 440 

CAC ATC TCA CAG GAA GAA TTC CAG ATC ATC CGT GGG AAC TTC CCT TAC 1633 
His lie Ser Gin Glu Glu Phe Gin lie lie Arg Gly Asn Phe Pro Tyr 
445 450 455 460 

CTC AGC GCC TTT GGG GAC CTC GAC CAG AAC CAG GAT GGC TGC ATC AGC 1681 
Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp Gly Cys lie Ser 
465 470 475 

AGG GAG GAG ATG GTT TCC TAT TTC CTG CGC TCC AGC TCT GTG TTG GGG 1729 
Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser Ser Val Leu Gly 
480 485 490 

GGG CGC ATG GGC TTC GTA CAC AAC TTC CAG GAG AGC AAC TCC TTG CGC 1777 
Gly Arg Met Gly Phe Val His Asn Phe Gin Glu Ser Asn Ser Leu Arg 
495 500 505 

CCC GTC GCC TGC CGC CAC TGC AAA GCC CTG ATC CTG GGC ATC TAC AAG 1825 
Pro Val Ala Cys Arg His Cys Lys Ala Leu lie Leu Gly lie Tyr Lys 
510 515 520 

CAG GGC CTC AAA TGC CGA GCC TGT GGA GTG AAC TGC CAC AAG CAG TGC 1873 
Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys His Lys Gin Cys 
525 530 535 540 

AAG GAT CGC CTG TCA GTT GAG TGT CGG CGC AGG GCC CAG AGT GTG AGC 1921 
Lys Asp Arg Leu . Ser Val Glu Cys Arg Arg Arg Ala Gin Ser Val Ser 
545 550 555 

CTG GAG GGG TCT GCA CCC TCA CCC TCA CCC ATG CAC AGC CAC CAT CAC 1969 
Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His Ser His His His 
560 565 570 

CGC GCC TTC AGC TTC TCT CTG CCC CGC CCT GGC AGG CGA GGC TCC AGG 2017 
Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg Arg Gly Ser Arg 
575 580 585 

CCT CCA GAG ATC CGT GAG GAG GAG GTA CAG ACG GTG GAG GAT GGG GTG 2065 
Pro Pro Glu lie Arg Glu Glu Glu Val Gin Thr Val Glu Asp Gly Val 
590 595 600 

TTT GAC ATC CAC TTG TAATAGATGC TGTGGTTGGA TCAAGGACTC ATTCCTGCCT 2120 
Phe Asp lie His Leu 
605 610 
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TGGAGAAAAT ACTTCAACCA GAGCAGGGAG CCTGGGGGTG TCGGGGCAGG AGGCTGGGGA 2180 

TGGGGGTGGG ATATGAGGGT GGCATGCAGC TGAGGGCAGG GCCAGGGCTG GTGTCCCTAA 2240 

GGTTGTACAG ACTCTTGTGA ATATTTGTAT TTTCCAGATG GAATAAAAAG GCCCGTGTAA 2300 

TTAACCTTC 23 09 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 609 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Met Ala Gly Thr Leu Asp Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
15 10 15 

Leu Arg Gly Cys lie Glu Ala Phe Asp Asp Ser Gly Lys Val Arg Asp 
20 25 30 

Pro Gin Leu Val Arg Met Phe Leu Met Met His Pro Trp Tyr lie Pro 
35 40 45 

Ser Ser Gin Leu Ala Ala Lys Leu Leu His lie Tyr Gin Gin Ser Arg 
50 55 60 

Lys Asp Asn Ser Asn Ser Leu Gin Val Lys Thr Cys His Leu Val Arg 
65 70 75 80 

Tyr Trp lie Ser Ala Phe Pro Ala Glu Phe Asp Leu Asn Pro Glu Leu 
85 90 95 

Ala Glu Gin lie Lys Glu Leu Lys Ala Leu Leu Asp Gin Glu Gly Asn 
100 105 110 

Arg Arg His Ser Ser Leu lie Asp lie Asp Ser Val Pro Thr Tyr Lys 
115 120 125 

Trp Lys Arg Gin Val Thr Gin Arg Asn Pro Val Gly Gin Lys Lys Arg 
130 " 135 140 

Lys Met Ser Leu Leu Phe Asp His Leu Glu Pro Met Glu Leu Ala Glu 
145 150 155 160 

His Leu Thr Tyr Leu Glu Tyr Arg Ser Phe Cys Lys lie Leu Phe Gin 
165 170 175 

Asp Tyr His Ser Phe Val Thr His Gly Cys Thr Val Asp Asn Pro Val 
180 185 190 

Leu Glu Arg Phe lie Ser Leu Phe Asn Ser Val Ser Gin Trp Val Gin 
195 200 205 

Leu Met lie Leu Ser Lys Pro Thr Ala Pro Gin Arg Ala Leu Val He 
210 215 220 

Thr His Phe Val His Val Ala Glu Lys Leu Leu Gin Leu Gin Asn Phe 
225 230 235 240 

Asn Thr Leu Met Ala Val Val Gly Gly Leu Ser His Ser Ser He Ser 
245 250 255 
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Arg Leu Lys Glu Thr His Ser His Val Ser Pro Glu Thr lie Lys Leu 
260 265 270 

Trp Glu Gly Leu Thr Glu Leu Val Thr Ala Thr Gly Asn Tyr Gly Asn 
275 280 285 

Tyr Arg Arg Arg Leu Ala Ala Cys Val Gly Phe Arg Phe Pro lie Leu 
290 295 300 

Gly Val His Leu Lys Asp Leu Val Ala Leu Gin Leu Ala Leu Pro Asp 
305 310 315 320 

Trp Leu Asp Pro Ala Arg Thr Arg Leu Asn Gly Ala Lys Met Lys Gin 
325 330 335 

Leu Phe Ser lie Leu Glu Glu Leu Ala Met Val Thr Ser Leu Arg Pro 
340 345 350 

Pro Val Gin Ala Asn Pro Asp Leu. Leu Ser Leu Leu Thr Val Ser Leu 
355 360 365 

Asp Gin Tyr Gin Thr Glu Asp Glu Leu Tyr Gin Leu Ser Leu Gin Arg 
370 375 380 

Glu Pro Arg Ser Lys Ser Ser Pro Thr Ser Pro Thr Ser Cys Thr Pro 
385 390 395 400 

Pro Pro Arg Pro Pro Val Leu Glu Glu Trp Thr Ser Ala Ala Lys Pro 
405 410 415 

Lys Leu Asp Gin Ala Leu Val Val Glu His lie Glu Lys Met Val Glu 
420 425 430 

Ser Val Phe Arg Asn Phe Asp Val Asp Gly Asp Gly His lie Ser Gin 
435 440 445 

Glu Glu Phe Gin lie lie Arg Gly Asn Phe Pro Tyr Leu Ser Ala Phe 
450 455 460 

Gly Asp Leu Asp Gin Asn Gin Asp Gly Cys lie Ser Arg Glu Glu Met 
465 470 475 480 

Val Ser Tyr Phe Leu Arg Ser Ser Ser Val Leu Gly Gly Arg Met Gly 
485 490 495 

Phe Val His Asn Phe Gin Glu Ser Asn Ser Leu Arg Pro Val Ala Cys 
500 505 510 

Arg His Cys Lys Ala Leu lie Leu Gly He Tyr Lys Gin Gly Leu Lys 
515 520 525 

Cys Arg Ala Cys Gly Val Asn Cys His Lys Gin Cys Lys Asp Arg Leu 
530 535 540 

Ser Val Glu Cys Arg Arg Arg Ala Gin Ser Val Ser Leu Glu Gly Ser 
545 550 555 560 

Ala Pro Ser Pro Ser Pro Met His Ser His His His Arg Ala Phe Ser 
565 570 575 

Phe Ser Leu Pro Arg Pro Gly Arg Arg Gly Ser Arg Pro Pro Glu He 
580 585 590 

Arg Glu Glu Glu Val Gin Thr Val Glu Asp Gly Val Phe Asp He His 
595 600 605 

Leu 
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(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 832 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE : 

(A) NAME /KEY : CDS 

(B) LOCATION: 11.. 733 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GCCCGCCGCC ATG CCG CCC TTA CTG CCC CTG CGC CTG TGC CGG CTG TGG 49 
Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp 
15 10 

CCC CGC AAC CCT CCC TCC CGG CTC CTC GGA GCG GCC GCC GGG CAG CGG 97 
Pro Arg Asn Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg 
15 20 25 

TCC AGA CCC AGT ACT TAT TAT GAA CTG TTG GGG GTG CAT CCT GGT GCC 145 
Ser Arg Pro Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala 
30 35 40 45 

AGC ACT GAG GAA GTT AAA CGA GCT TTC TTC TCC AAG TCC AAA GAG CTG 193 
Ser Thr Glu Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu 
50 55 60 

CAC CCA GAC CGG GAC CCT GGG AAC CCA AGC CTG CAC AGC CGC TTT GTG 241 
His Pro Asp Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val 
65 70 75 

GAG CTG AGC GAG GCA TAC CGT GTG CTC AGC CGT GAG CAG AGC CGC CGC 2 89 

Glu Leu Ser Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg 
80 85 90 

AGC TAT GAT GAC CAG CTC CGC TCA GGT AGT CCC CCA AAG TCT CCA CGA 337 
Ser Tyr Asp Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg 
95 100 105 

ACC ACA GTC CAT GAC AAG TCT GCC CAC CAA ACA CAC AGC TCC TGG ACA 385 
Thr Thr Val His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr 
110 115 120 125 

CCC CCC AAC GCA CAG TAC TGG TCC CAG TTT CAC AGC GTG AGG CCA CAG 433 
Pro Pro Asn Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin 
130 135 140 

GGG CCC CAG TTG AGG CAG CAG CAA CAC AAA CAA AAC AAA CAA GTG CTG 481 
Gly Pro Gin Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu 
145 150 155 

GGG TAC TGC CTC CTC CTC ATG CTG GCG GGC ATG GGC CTG CAC TAC ATT 529 
Gly Tyr Cys Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr lie 
160 165 170 

GCC TTC AGG AAG GTG AAG CAG ATG CAC CTT AAC TTC ATG GAT GAA AAG 577 
Ala Phe Arg Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys 
175 180 185 
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GAT CGG ATC ATC ACA GCC TTC TAC AAC GAA GCC CGG GCA CGG GCC AGG 625 
Asp Arg lie lie Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg 
190 195 200 " 205 

GCC AAC AGA GGC ATC CTT CAG CAG GAG CGA CAA CGG CTA GGG CAG CGG 673 
Ala Asn Arg Gly lie Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg 
210 215 220 

CAG CCG CCA CCA TCC GAG CCA ACC CAA GGC CCC GAG ATC GTG CCC CGG 721 
Gin Pro Pro Pro Ser Glu Pro Thr Gin Gly Pro Glu lie Val Pro Arg 
225 230 235 

GGC GCC GGC CCC TGA GGGGCTC ACCTGGATGG GGCCTGCAGT GCGTTCCCGC 773 
Gly Ala Gly Pro * 
240 

TTTGCTTCCT TCCCTGGACG GCCCGCTCCC CGAAACGCGC GCAATAAAGT GATTCGCAG 832 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 241 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : . 

Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp Pro Arg Asn 
15 10 15 

Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg Ser Arg Pro 
20 25 30 

Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala Ser Thr Glu 
35 40 45 

Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu His Pro Asp 
50 55 60 

Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val Glu Leu Ser 
65 70 75 .80 

Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg Ser Tyr Asp 
85 90 95 

Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg Thr Thr Val 
100 105 110 

His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr Pro Pro Asn 
115 120 125 

Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin Gly Pro Gin 
130 135 140 

Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu Gly Tyr Cys 
145 150 155 160 

Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr He Ala Phe Arg 
165 170 175 

Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys Asp Arg He 
180 185 190 

He Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg Ala Asn Arg 
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195 200 205 



Gly lie Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg Gin Pro Pro 
210 215 220 

Pro Ser Glu Pro Thr Gin Gly Pro Glu He Val Pro Arg Gly Ala Gly 
225 230 235 240 

Pro 



SEQ ID Nos: 10-18 25-36 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 00 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 170.. 300 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 120 

TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCGGGTTAG CCC CAT 175 

Pro His 
1 

GGG AAC GGG GTT CGG TCC GAG CCC GGT GGG AGG CTC CCG GAG CGC AGC 223 
Gly Asn Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser 
5 10 15 

CTG GGC CCA GCC CAC CCC GCG CCG GCG GCC ATG GCA GGC ACC CTG GAC 271 
Leu Gly Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp 
20 25 30 

CTG GAC AAG GGC TGC ACG GTG GAG GAG CT 300 
Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
35 40 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



Pro His Gly Asn 
1 



Gly Val Arg Ser Glu 
5 



Pro Gly Gly Arg Leu Pro Glu 
10 15 



Arg Ser Leu Gly 
20 



Pro Ala His Pro Ala 
25 



Pro Ala Ala Met Ala Gly Thr 
30 



Leu Asp Leu Asp 
35 



Lys Gly Cys Thr Val 
40 



Glu Glu Leu 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE :. DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GGGATCCCCC TGGTC 15 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Asp Val Asp Glu Glu Asp Glu Val Glu Asp lie Glu Phe 
15 10 

(2) INFORMATION FOR SEQ ID NO:ll: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Asp Val Asp Gly Asp Gly His He Ser Gin Glu Glu Phe 
15 10 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Asp His Asp Arg Asp Gly Phe He Ser Gin Glu Glu Phe 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Asp Gin Asn Gin Asp Gly Cys He Ser Arg Glu Glu Met 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Asp Val Asp Met Asp Gly Gin He Ser Lys Asp Glu Leu 
15 10 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

His Phe Val His Val Ala Glu Lys Leu Leu Gin Leu Gin Asn Phe Asn 
15 10 15 

Thr Leu Met Ala Val Val Gly Gly Leu Ser His Ser Ser He Ser Arg 
20 25 30 
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Leu Lys Glu Thr His 
35 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 

Lys Phe Val His Val Ala Lys His Leu Arg Lys He Asn Asn Phe Asn 
1 5 10 15 

Thr Leu Met Ser Val Val Gly Gly He Thr His Ser Ser Val Ala Arg 
20 25 30 

Leu Ala Lys Thr Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

His Asn Phe Gin Glu Ser Asn Ser Leu Arg Pro Val Ala Cys Arg His 
1 5 10 15 

Cys Lys Ala Leu He Leu Gly He Tyr Lys Gin Gly Leu Lys Cys Arg 
20 25 30 

Ala Cys Gly Val Asn Cys His Lys Gin Cys Lys Asp Arg Leu Ser Val 
35 40 45 

Glu Cys 
50 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

His Asn Phe His Glu Thr Thr Phe Leu Thr Pro Thr Thr Cys Asn His 
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1*5 10 15 

Cys Asn Lys Leu Leu Trp Gly lie Leu Arg Gin Gly Phe Lys Cys Lys 
20 25 30 

Asp Cys Gly Leu Ala Val His Ser Cys Cys Lys Ser Asn Ala Val Ala 
35 40 45 

Glu Cys 
50 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GGGATCCCCC TGGTC 15 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
GAATTCGGCA CGAGCCGACG G 21 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 78 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
ATGGAGCAGA AGCTGATCTC CGAGGAGGAC CTGCCCGGGG CAGCTGGATC CGCAGCCCAC 60 
CCCGCGCCGG CGGCCATG 78 
(2) INFORMATION FOR SEQ ID NO: 22: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Met Glu Gin Lys Leu lie Ser Glu Glu Asp Leu Pro Gly Ala Ala Gly 
1 5 10 15 

Ser Ala Ala His Pro Ala Pro Ala Ala Met 
20 25 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
GGATCCGCAG CCCACCCCGC GCCGGCGGCC ATG 33 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(CJ STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Gly Ser Ala Ala His Pro Ala Pro Ala Ala Met 
5 10 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
GGACAAAGTG TGTGATGAAC C 21 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CTCATCCTCC GTCTGATACT G 21 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GTAGATGTGG ATCAGCTTGG 20 
(2) INFORMATION FOR SEQ ID NO: 28: . 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
AGGTGGAGAA TGGTCAAGG 19 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GTCATAGTCT GTCTCCTACT 



20 
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(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 30 
ACATAGACAG CGTGCCTACC 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
TACAACCTTA GGGACACCAG 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
TGCTGAGCCT GCTCACGGTG 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CAAGTGAACA GCACGTCC 
(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 
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( A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
GACTATCTCA AGGACCAGCT G 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
GGTTCGGTCC GAGCCCGG 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
GGAGCGATAC TCCAAGTAGG T 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
AGCGGGCCAG GCCCCTTC 



(2) INFORMATION FOR SEQ ID NO: 38: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 
CATCCTGGTC CAATGCGCTC 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
GCACTGAGGA AGTTAAACGA GC 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 
GCTCGTTTAA CTTCCTCAGT GC 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 



GCTCAGCTCC ACAAAGCGGC T 
(2) INFORMATION FOR SEQ ID NO: 42: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 



ACCAGCTCCG CTCAGGTAG 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 



TCCAGGAGCT GTGTGTTTGG 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 



CCAGTTTCAC AGCGTGAGG 19 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 



CAGCATGAGG AGGAGGCAG 



19 
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CLAIMS: 

1 . An isolated nucleic acid molecule comprising a sequence of nucleotides encoding or 
complementary to a sequence encoding an amino acid sequence having homology to a regulator 
of gene expression or a derivative of said gene regulator. 

2. An isolated nucleic acid molecule according to claim 1 wherein the regulator 
comprises a zinc finger domain of an (HC 3 ) 2 type. 

3. An isolated nucleic acid molecule according to claim 2 wherein the sequence of 
nucleotides or complementary sequence of nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

4. An isolated nucleic acid molecule according to claim 1 wherein said gene regulator is 
a guanine nucleotide exchange factor (GEF) or a derivative thereof. 

5. An isolated nucleic acid molecule according to claim 4 wherein the sequence of 
nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 or 
7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
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nucleotide sequence set forth in (i), (ii) or (iii). 

6. An isolated nucleic acid molecule according to claim 1, wherein said gene regulator 
is a heat shock protein or is a heat shock binding protein or a derivative thereof. 

7. An isolated nucleic acid molecule according to claim 6, wherein the sequence of 
nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ED NO:8; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

8. A genetic construct comprising a vector portion and a gene portion comprising a 
regulator of gene expression or a derivative thereof . 

9. A genetic construct according to claim 8 wherein the gene portion comprises a zinc 
finger domain of (HC 3 ) 2 type. 

10. A genetic construct according to claim 9 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 
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11. A genetic construct according to claim 8 wherein said gene portion is a nucleotide 
exchange factor (GEF) or derivative thereof. 

12. A genetic construct according to claim 11 wherein the gene portion comprises a 
nucleotide sequence selected from: 

a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 or 
7; 

a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

13. A genetic construct according to claim 8 wherein the gene portion is a heat shock 
protein or a derivative thereof or a heat shock binding protein or derivative thereof. 

14. A genetic construct according to claim 13 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

* 

15. A nucleic acid molecule encoding a gene regulator having the identifying 
characteristics of a molecule selected from MCG4, MCG7 and MCG18 having respective amino 
acid sequences of SEQ ID NO:3, SEQ ID NO: 5 or 7 and SEQ ID NO:9. 



0) 
(ii) 

(iii) 

(iv) 
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1 6. A method of detecting a condition caused or facilitated by an aberration in mcg4, said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcg4 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

17. A method of detecting a condition caused or facilitated by an aberration in mcg4, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG4 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

18. A method for detecting MCG4 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG4 or its 
derivatives or homologues for a time and under conditions sufficient for an antibody-MCG4 
complex to form, and then detecting said complex. 

19. A method of detecting a condition caused or facilitated by an aberration in mcg7 t said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcg7 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

20. A method of detecting a condition caused or facilitated by an aberration in meg 7, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG7 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

21. A method for detecting MCG7 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG7 or its 
derivatives or homologues for a time and under conditions sufficient for an antibody-MCG7 
complex to form, and then detecting said complex. 
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22. A method of detecting a condition caused or facilitated by an aberration in mcgl8, said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcgl8 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

23. A method of detecting a condition caused or facilitated by an aberration in meg 18, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG18 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

24. A method for detecting MCG18 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG18 or 
its derivatives or homologues for a time and under conditions sufficient for an antibody-MCG18 
complex to form, and then detecting said complex. 
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TCAGTAAACA CAGAGACTGG GGATCGATC ATG GGG CTT TGT AAG TGC CCC AAG 5 3 

Met Gly Leu Cys Lys Cys Pro Lys 

1 5 

AGA AAG GTG ACC AAC CTG TTC TGC TTC GAA CAT CGG GTC AAC GTC TGC 101 

Arg Lys Val Thr Asn Leu Phe Cys Phe Glu His Arg Val Asn Val Cys 

10 15 20 

GAG CAC TGC CTG GTA GCC AAT CAC GCC AAG TGC ATC GTC CAG TCC TAC 14 9 

Glu His Cys Leu Val Ala Asn His Ala Lys Cys He Val Gin Ser Tyr 
25 30 35 40 

CTG CAA TGG CTC CAA GAT AGC GAC TAC AAC CCC AAT TGC CGC CTG TGC 197 
Leu Gin Trp Leu Gin Asp Ser Asp Tyr Asn Pro Asn Cys Arg Leu Cys 
45 50 55 

AAC ATA CCC CTG GCC AGC CGA GAG ACG ACC CGC CTT GTC TGC TAT GAT 24 5 

Asn lie Pro Leu Ala Ser Arg Glu Thr Thr Arg Leu Val Cys Tyr Asp 
60 65 70 

CTC TTT CAC TGG GCC TGC CTC AAT GAA CGT GCT GCC CAG CTA CCC CGA 2 93 

Leu Phe His Trp Ala Cys Leu Asn Glu Arg Ala Ala Gin Leu Pro Arg 
75 " 80 85 

AAC ACG GCA CCT GCC GGC TAT CAG TGC CCC AGC TGC AAT GGC CCC ATC 341 
Asn Thr Ala Pro Ala Gly Tyr Gin Cys Pro Ser Cys Asn Gly Pro He 
90 95 100 

TTC CCC CCA ACC AAC CTG GCT GGC CCC GTG GCC TCC GCA CTG AGA GAG 389 
Phe Pro Pro Thr Asn Leu Ala Gly Pro Val Ala Ser Ala Leu Arg Glu 
105 110 H5 120 

AAG CTG GCC ACA GTC AAC TGG GCC CGG GCA GGA CTG GGC CTC CCT CTG 437 
Lys Leu Ala Thr Val Asn Trp Ala Arg Ala Gly Leu Gly Leu Pro Leu 
125 130 135 

ATC GAT GAG GTG GTG AGC CCA GAG CCC GAG CCC CTC AAC ACG TCT GAC 485 
He Asp Glu Val Val Ser Pro Glu Pro Glu Pro Leu Asn Thr Ser Asp 
140 145 150 

TTC TCT GAC TGG TCT AGT TTT AAT GCC AGC AGT ACC CCT GGA CCA GAG 53 3 

Phe Ser Asp Trp Ser Ser Phe Asn Ala Ser Ser Thr Pro Gly Pro Glu 
155 160 165 • 

GAG GTA GAC AGC GCC TCT GCT GCC CCA GCC TTC TAC AGC CGA GCC CCC 581 
Glu Val Asp Ser Ala Ser Ala Ala Pro Ala Phe Tyr Ser Arg Ala Pro 
170 175 180 

CGG CCC CCA GCT TCC CCA GGC CGG CCC GAG CAG CAC ACA GTG ATC CAC 629 
Arg Pro Pro Ala Ser Pro Gly Arg Pro Glu Gin His Thr Val He His 
185 190 195 200 

ATG GGC AAT CCT GAG CCC TTG ACT CAC GCC CCT AGG AAG GTG TAT GAT 677 
Met Gly Asn Pro Glu Pro Leu Thr His Ala Pro Arg Lys Val Tyr Asp 
205 210 215 
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ACG CGG GAT GA . w CGG ACA CCA GGC CTC CAT Gvj*n jA- - A » jn. 

Thr Arg Asp Asp Asp Arg Thr Pro Gly Leu His Giy Asp Cys Asp Asp 
220 225 23C 

GAC AAG TAC CGA CGT CGG CCG GCC TTG GGT TGG CTG GCC CGG CTG CTA 77 3 

Asp Lys Tyr Arg Arg Arg Pro Ala Leu Gly Trp Leu Ala Arg Leu Leu 
235 240 245 



AGG AGC CGG GCT GGG TCT CGG AAG CGG CCG CTG ACC CTG CTC CAG CGG . 821 

Arg Ser Arg Ala Gly Ser Arg Lys Arg Pro Leu Thr Leu Leu Gin Arg 
250 255 260 

GCG GGG CTG CTG CTA CTC TTG GGA CTG CTG GGC TTC CTG ^CC CTC CTT 869 
Ala Gly Leu Leu Leu Leu Leu Gly Leu Leu Gly Phe Leu Ala Leu Leu 
265 270 275 280 

GCC CTC ATG TCT CGC CTA GGC CGG GCC GCA GCT GAC AGC GAT CCC AAC 917 
Ala Leu Met Ser Arg Leu Gly Arg Ala Ala Ala Asp Ser Asp Pro Asn 
285 290 295 

CTG GAC CCA CTC ATG AAC CCT CAC ATC CGC GTG GGC CCC TCC TGA 96 2 

Leu Asp Pro Leu Met Asn Pro His lie Arg Val Gly Pro Ser 
300 305. 310 

GCCCCCTTGC TTGTGGCTAG GCCAGCCTAG GATGTGGGTT CTGTGGAGGA GAGGCGGGGT 1022 

AATGGGGAGG CTGAGGGCAC CTCTTCACTG CCCCTCTCCC TCAAGCCTAA GACACTAAGA 1082 

CCCCAGACCC AAAGCCAAGT CCACCAGAGT GGCTCGCAGG CCAGGCCTGG AGTCCCCGTG 114 2 

GGTCAAGCAT TTGTCTTGAC TTGCTTTCTC CCGGGTCTCC AGCCTCCGAC CCCTCGCCCC 1202 

ATGAAGGAGC TGGCAGGTGG AAATAAACAA CAACTTTATT 1242 
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Figure 2 

gb|AA15S210'.AAlS521O mr98e01.rl Stracagene mouse entry on ic carcinoma 
,*937317| Mus musculus cDNA clone 605496 5 

Figure 3 

dbj|D759l3|CEIJClllC3F Celegans cONA clone yklllg3 = 5" end. single read, 
nuery- 7 PKRKVTNLFCFEHRVtA/CEHCI-VANHAKC IVQSYl^^DWP^RLCNI PLASFSTT 66 
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Figure 5 



sp|P46580|YLB5_CAEEL HYPOTHETICAL 146.8 KD PROTEIN C34E10.5 IN 

CHROMOSOME III gi| 500728 (U10402) C34E10.5 gene product 
[Caenorhabditis elegans) 

Query : 5 6 CNI PLJLSRETTRLVCTOmJW^^ 100 

C+I L «• L C LFWO EA ♦ ♦ ♦ *CP C 

Sbjct: 1222 CSICLENKNPSALFCGHLFCOTCIOEHAVAATSSASTSSARCPQC 1266 



Figure 6 

gi |703468 (L29051) homologous to GATA-binding transcription factor 
{Schizosaccharomyces pombe) 

Query: 35 CrVQSYLQWL^DSDYNPNCRLCNI 58 

C «• *W *D MP C C ♦ 
Sbjct: 175 CATTNTPKWRKDESGNP ICNACGL 198 

Query: 162 SSTPGPEEVTOSASAAPAFYSQAPRPPASPGRPEQtflVIHMS^ 221 

♦S PEE S S S P* SP ♦ *Q P «V ♦ . D 

Sbjct: 441 ASIXNPEEPPSNSDKQPSMSNSPKSEVSPSQS^ 500 

Query: 222 RTPGLH 227 

R L* 
Sbjct: 501 RNYALN 506 
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gb|AA074703|AA074703 zn.76g07.rl Stratagene neuroepichelium (#937231) 
Homo sapiens cCNA clone 531612 5' 
Length =417 

Plus Strand HSPs: 

Score = 818 (226.0 bits). Expect = 6.1e-103. Sum PIS) = 6.1e-103 
Identities = 206/259 (79%). Positives = 206/259 (79%), strand = Plus / Plus 

° uery: 446 OGcercccivuAit^^ 

sbi Ct , n , I' 1 11111 11 "ilium ii in in i mim inm 

° Uery: 566 G f n 7 TO, iT?^ 625 

Sh5Ctl 229 CCCGAGCA «^^^ 288 

Query: 686 AAGGTGTATOATACGCGGG 704 

II II Hill II I II 
Sbjct: 289 AAACTATATOACACACCGG 307 

Score = 230 (63.6 bits), Expect = 6.1e-103. Sum P(5) = 6.1e-103 
Identities = 50/55 (90%). Positives = 50/55 (90%). strand = Plus / Plus 

QUery: 398 ^T^CTGAGAGAGAA^^ 

sbict- 2 cUJLUJiiLlL'lJU^ iiiiiiiiiiiii iiiiimmiiimjMi 

Score = 175 (48.4 bits). Expect = 6.1e-103. Sum P(5) = 6.1e-103 
Identities = 39/44 (88%). Positives = 39/44 (88%). strand = Plus / Plus 

° Uery: 767 «CT 1 U^lU^^jCTCGCCCX53CTGCTAAi3GAOCCOGGCTGC3GTC 810 

Sbict- rrilL "" """"" I""! I I I I I t I I 1 1 1 i f 1 I 1 I 
Sbjct: 373 GCTCTOCXXTOaCiaXXXaGCT^^ 416 

f^'! 139 bitSl ' Bcpect " 6.1e-103, Sum P(5) = 6.1e-103 

Identities = 31/35 (88%). Positives = 31/35 (88%), strand = Plus / Plus 

Query: 731 CKAGACTCnGACGATGACAACTACCGACGTCGGCC 765 

swot- ilUIJJ'U 1 1111 111 " urn 

Sbjct: 336 GGAGACTGTCATCATGACAAA.TACCGCCGCCOGCC 370 

Score = 133 (36.8 bits). Expect = 6.1e-103. Sum P(5) , S.le-103 
Identities = 29/32 (90%). Positives = 29/32 (90%). strand = Plus / Plus 

Query: 701 CQGGATGATGACOGGACACCflGGCCTCCATGG 732 

sbict- JiL mM,m 11,111 11111 1 I'm 

SDjct. 305 CGGGATGATGAOCGGACAGCACOCATTCATGG 336 
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gb|AA134788|AA134788 zm81g02.rl Stratagene neuroepithelium i #937231) 
Homo sapiens cDNA clone 532082 5' 
Length = 368 

Plus Strand HSPs: 

Score = 563 (155.6 bits), Expect = 3.8e-87 ( Sum P{3) = 3.8e-87 

Identities = 147/190 (77%), Positives = 147/190 (77%), Strand = Plus / Plus 



Query: 


498 


Sbjct: 


103 


Query: 


558 


Sbjct: 


163 


Query: 


618 


SbjCt: 


223 


Query: 


678 


Sbjct: 


283 


Score 


= 454 



iiiiiini ii iii ii minim n in in i mini 



ii mi i i mi n ii inn mum inn n n i mi 

CAGGCCGGCCXXVUSCAGCACACAGTGATCCACA^ 

II llll IIIIIMIIIIIIIIII II IIIIIIM I I till II II I UN 

CAAGCCGTCCCGAGCAGCAG&CA 
CCCCTAOGAA 687 

iiii mil 



Identities = 94/98 (95%), Positives = 94/98 (95%), Strand = Plus / Plus 
Query 



Sbjct 
Query 
Sbjct 



398 GCACTGAGJU3JU3AA£XITGGCC 457 

lllllllllll Mill IIIIIIIIIIIMIIIMMIIIIMIIIIIIII^ £ 

2 GCACTGAGAGACAAGCTAGCXIACAC7IXIAA 61 

458 ATCGATGAGGTCGTGACXX^^ 495 

llllllllllll I lllllllllllllllllllllll 
62 ATCGATGAGGTGATAAGCCCAGWXXXGAGCCCCTCAA 99 



Score = 219 (60.5 bits), Expect = 3.8e-87, Sum P(3) = 3.8e-87 
Identities = 51/60 (85%), Positives = 51/60 (85%), Strand = Plus / Plus 

Query: 702 GGC^TGATCACCGGACACCAGGCCTCC^ 761 

ii iiiiiiiiiiiii inn i mmiimm iiimii nut n i _ 

SbjCt: 309 QGATTCATGACCGGACAGCAGGCATTC 368 



Figure 9 

W32939 human TACCGCCCTTCGGAACCAGTGCAGC^ 
AA242159 mouse CITCCGCGCnTTTCATrACCGTAC^ 
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MCG4 MCLOKCPKRK VTOLFCFEHR VNVCEHCLVA NHAKCIVQSY LQWLQDSDYN PNCRLCNIPL 60 

MCG4 ASRETTTRLVC YDLFHWACLN ERAAQLPW7T APAGYQCPSC NGPIFPPTNL AGPVASALRE 120 

[ 229 ] 

5. " 

I 74 ] 



130 140 150 160 170 180 

KLATVNWARA GLGLPLIDEV VSPEPEPU7T SDFSDWSSFN ASSTPGPEEV DSASAAPAFY 
20 30 40 50 60 

1 J/z J - .******** i # ****"*s •*•*••**** *tt*svq**r a*tps*»**»> 

f 2 2di l 30 40 50 60 

1 J — aqs*s*sip * **** *tt*svq**r a*tps*»**«> 

r 3 229 , 20 30 40 50 60 | 

1 l2 * ] •••*•• x . s xrn^vq!. chhhlcarge sqh*icac«l> 



. 5 - A , 10 II 30 40 50 . 60 I 

[ 74 1 x smr-a q"s*-sipq tslig-pal- nppp*lckrr ep«lhlxlli> 

19 ° 200 210 220 230 240 



WCG4 



6. 

( 3B ] 



S^APRPPASP GRPEQHWIH M3NPEPLTHA PRKVYDTRDD DRTPGLHGDC DODKYRRRPA 

t 



' ' i P * s*«***»**» "st*a'«" ****'**pgp *srhswetvn mtnt-aagl*> 

2- y 70 80 90 
[ 243 ] .\..... p .. s at . a . a > 

> I 

3 ; 70 80 90 100 | 110 120 

[ 229 J gsp*sslpk* s*a-a # sht* gey*s*g*r- *kek*m*hg* •** a *i # *** 

4 - 70 80 90 100 " 110 120 

I 86 J _p*sslpk* s*a-a*sht* gey*s*g*rp kesi*h»gnm tgqqafm**- •** c> 

h 

5 - 70 80 90 100 110 

( 74 ) arl*allppq av*sstqsyt w*vlk*w-*t *qgk*m**** ***a*i**> 

g 

I 

1 100 
»t *q*******> 

250 260 270 280 290 300 

**••** 
MCG4 LGWLARLLRS HAGSRKRPLT LLQRAGLLLL LGLLGFLALL ALMSRU3RAA ADSDPNLDPL 

1- 130 

< 372 ] q > 

4. * 
[ 86 ] s*-»»> 



MCG4 



310 

MNPHIRVGPS 
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Search Analysis for Sequence: MCG4 



Matrix: pam250 matrix 



Search from 1 to 310 



Score Region from 1 to 310 



Date: September 22,1997 



Maximum possible score: 1598 



Aligned sequences: 

1. = EST AA074703 phase 1 translation 

2. = EST AA134788 phase 3 translation 

3. s EST AA134788 phase 2 translation 

4. = EST AA074703 phase 3 translation 

5. = EST AA074703 phase 2 translation 

6. = EST AA134788 phase 1 translation 
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FIGURE 11 Domains of MCG4 



zinc finger 



acidic 




leucine 
zipper basic 



138 171 



234 241 269 288 



zinc finger consensus: CX 2 HX 4 CX 2 CX 4 HX 2 CX i7 CX 2 CX l8 HX 2 CX l8 CX 2 C 
acidic domain consensus: 9/34 negatively charged amino acids, 0/34 positively charged 
basic domain consensus: 13/55 positively charged amino acids, 0/55 negatively charged 
leucine zipper domain consensus: LX ^IjyiX LX fi L 

alternate "novel " leucine zipper-like motif where leucine would not be aligned along 
the one surface of an alpha helix domain: (aa 261) LX LXLX LXLX L (aa 286) 

6 6 6 



WO 98/53061 



11/32 



PCT/AU98/00380 



FIGURE 12 



Sequences producing High-scoring Segment Pairs: 



gnl|PID|e236178 

gi 11293099 

gi 1X655941 

pir| (S30356 

sp | P43069 | CC25_CANAL 

sp|P28818|GNRP_RAT 

pr£| (1814463A 

pirj (B46199 

gnl|PID|e238680 

pirj |S22693 

sp|P14771|SC25_YEAST 

sp I P26674 | STE6_SCHPO 

pir| |S28407 

sp | P2767 1 1 GNRP_MOUSE 

gi (386047 

sp (Q02342 | CC2 5_SACKL 
pir| (S14177 
gi (433720 
gnl|PID|e24l744 
gi|3484 

sp j P04821 1 CC25_YEAST 

gi | 915328 

pir| (A46199 

pdb|lPTR| 

gi (915330 

gi 1 474982 

gi 1 1763306 

gi ( 806957 

sp (Q033 85 | GNDSJCUSE 
pir| (BVBYL1 
gi | 452242 

sp | P07866 | LTE1_YEAST 
gi | 509050 
gi j 520587 

spj P05130 |KFCl_DROME 

pir| | S3 5704 

sp | Q056 55 1 KPCD_HUMAN 

pir| (S40279 

sp|P09215|KPCD_RAT 

gi | 520878 

gij 1519719 



(270752) F2SB3.3 [Caenorhabditis ele.. 
(U53B84) aimless RasGEF [Dictyosteli . . 
(U67326) Ras-GRF2 (Mus musculus) 
CDC25 protein homolog - yeast (Candi.. 
CELL DIVISION CONTROL PROTEIN 25 
GUANINE NUCLEOTIDE RELEASING PROTEIN. . 
guanine nucleotide-releasing factor . . 
nucleotide-exchange- factor homolog c. . 
(X97560) hypothetical protein L1309 . . 
CDC25 protein homolog - mouse /gi|50.. 
SCD25 PROTEIN /gi|457494 (M26647) SD. . 
STE6 PROTEIN /pir j (S28098 Ste6 prote. . 
CDC25 protein homolog - mouse 
GUANINE NUCLEOTIDE RELEASING PROTEIN. . 
(S62035) Ras-specific guanine nucleo.. 
CELL DIVISION CONTROL PROTEIN 25 /pi. • 
SCD25 protein - yeast (Saccharomyces.. 
(L26584) CDC25 (Homo sapiens] 
(Z68880) T14G10.2 {Caenorhabditis el.. 
(X03579) CDC25 protein (aa 1-1588) {.. 
CELL DIVISION CONTROL PROTEIN 25 /pi.. 
(U24070) Muncl3-1 [Rattus norvegicusj 
nucleotide-exchange- factor homolog c. 

Molecule: Protein Kinase C Delta Ty. . 
(U24071) Muncl3-2 (Rattus norvegicus] 
(D21239) 'C3G protein* [Homo sapiens. . 
(U75361) Muncl3-3 [Rattus norvegicus) 
guanine-nucleotide exchange factor C 
GUANINE NUCLEOTIDE DISSOCIATION STIM. . 
LTE1 protein - yeast (Saccharomyces . 

(D21354) a putative guanine nucleoti. 
LOW TEMPERATURE ESSBTTIAL PROTEIN /p. 

(222521) protein kinase C delta (Horn. 

(D10495) protein kinase C delta-type. 
PROTEIN KINASE C, BRAIN ISOZYME (PKC. 
protein kinase C (EC 2.7.1.-) delta . 
PROTEIN KINASE C, DELTA TYPE (NPKC-D. 
protein kinase C mu - human /pir \ (AS. 

PROTEIN KINASE C, DELTA TYPE (NPKC-D. 

(Z34524) serine/ threonine protefn ki. 

(U68142) RalGDS-like [Homo sapiens] 
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FIGURE 13(a) (i) 

MCG7 - Cloning of a novel human gene that encodes a guanine exchange factor 

CGATTTCATTCCTCGCTCCCCACAGGTCCCTCTCCCCAAAATATTCCCATCTTGTCCTAG 6 0 
IS PLAPHRSLSPKYSHLVL 19 
CCCATCCCCCAGACTATCTCAAGGACCAGCTGTCCCCACGCCCCCGACCTCCACTAGGCC 120 
AHPPDYLKDQLSPRPRPPLG 39 
TGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGGGCCGGGTTAGCCCCATGGGAA 160 
LCHPLPAGRRPVPGRVS PMG 59 
CGcagcgcctgtgtggccgcgggactcaaggctggcctggctcaagtgaacagcacgtcc 240 
T QRLCGRGTQGWPG5SEQHV 79 
aggaggcgacctcgtccgcgggtttgcattctggggtggacgagctggGGGTTCGGTCCG 300 
QEATSSAGLHSGVDELGVRS . ?X 
AGCCCGGTGGGAGGCTCCCGGAGCGCAGCCTGGGCCCAGCCCACCCCGCGCCGGCGGCCA'^ 360 
EPGGR LPERSLGPAHPAPAA 119 
JGGCAGGCA C C CTGG AC CTGG ACAAGGG CTG CAC GGTGG AGG AG CTG CTC C G CGGGTG CA 420 
MAGTLDLDKG.CTVEELLRGC 139 
TCGAAGCCTTCGATGACTCCGGGAAGGTGCGGGACCCGCAGCTGGTGCGCATGTTCCTCA 480 
I E A F DDS GKVRD PQLVRMFL 159 
TGATGCACCCCTGGTACATCCCCTCCTCTCAGCTGGCGGCCAAGCTGCTCCACATCTACC 540 
MMH PWYI PS SQLAAK LL HIY 179 
AACAATCCCGGAAGGACAACTCCAATTCCCTGCAGGTGAAAACGTGCCACCTGGTCAGGT 600 
QQSRK.DNSNSLQVKTCHLVR 199 
ACTGGATCTCCGCCTTCCCAGCGGAGTTTGACTTGAACCCGGAGTTGGCTGAGCAGATCA 660 
YWI SAFPAEFDLNPELAEQI 219 
AGGAGCTGAAGGCTCTGCTAGACCAAGAAGGGAACCGACGGCACAGCAGCCTAATCGACA 720 
KELKALLDQEGNRRHSSLXD 239 
TAGACAGCGTCCCTACCTACAAGTGGAAGCGGCAGGTGACTCAGCGGAACCCTGTGGGAC 780 
I DSV PTYKWKRQVTQRNPVG 259 
AGAAAAAGCGCAAGATGTCCCTGTTGTTTGACCACCTGGAGCCCATGGAGCTGGCGGAGC 840 
QKKRKMSLL FDHLEPMELAE 279 
ATCTCACCTACTTGGAGTATCGCTCCTTCTGCAAGATCCTGTTTCAGGACTATCACAGTT 900 
H LTYL EYR S FCK ILFQD YHS 299 
TCGTGACTCATGGCTGCACTGTGGACAACCCCGTCCTGGAGCGGTTCATCTCCCTCTTCA 960 
FVTH GCTVD NPVLE RF I SLF 319 
ACAGCGTCTCACAGTGGGTGCAGCTCATGATCCTCAGCAAACCCACAGCCCCGCAGCGGG 1020 
NSV SQWVQ LMILSKPTAPQ R 339 
CCCTGGTCATCACACACTTTGTCCACGTGGCGGAGAAGCTGCTACAGCTGCAGAACrTC^ 1080 
A L V I THFVHVAEKLLQLQNF 359 
ACACGCTGATGGCAGTGGTCGGGGGCCTGAGCCACAGCTCCATCTCCCGCCTCAAGGAGA 114 0 
NTLMAVVGGLSHSS ISRLKE 379 
CCGACAGCCACGTTAGCCCrcAGACCATCAAGCTCTGGGAGGGTCTCACGGAACTAGTGA 1200 
THS HV S PET I KLWEGLTELV 399 
CGGCGACAGGCAACTATGGCAACTACCGGCGTCGGCTGGCAGCCTGTGTGGGCTTCCGCT 1260 
TATGNYGNYRRRLAACVGFR 419 
TCCCGATCCTGGGTGTGCACCTCAAGGACCTGGTGGCCCTGCAGCTGGCACTGCCTGACT 1320 
FPI LGVHLKDLVALQLALPD 439 
GGCTGGACCCAGCCCGGACCCGGCTGAACGGGGCCAAGATGAAGCAGCTCTTTAGCATCC 1380 
WL. DPARTRLNGAKMKQLFSX 459 
TGGAGGAGCTGGCCATGGTGACCAGCCTGCGGCCACCAGTACAGGCCAACCCCGACCTGC 1440 
LEELAMVTS LRPPVQANPDL 479 
TGAGCCTGCTCACGGTGTCTCTGGATCAGTATCAGACGGAGGATGAGCTGTACCAGCTGT 1500 
LS L L TVSLDQYQTEDELYQL 499 
CCCnX^GCGGGAGCCGCGCTCC^GTCCrCGCCAACCAGCCCCACGAGT^ 1560 
SLQ RE PRSKS S PTS PTSCTP 519 
CACCCCGGCCCCCGGTACTGGAGGAGTGGACCTCGGCTGCCAAACCCAAGCTGGATCAGG 1620 
PPRP PVLEEWTSAAKPKLDQ 539 
CCCTCGTGGTGGAGCACATCGAGAAGATGGTGGAGTCTGTGTTCCGGAACTTTGACGTCG 1680 
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FIGURE 13(a) (u) 



ALVVEHIEKMVESVFRMFDV SS9 
ATGGGQATGGCCACATCTCACAGGAAGAATTCCAGATCATCCiSTGGGAACTTCCCTTACC 1740 
DGDGHISQEEPQIIRGNFPY S79 
TCAGCGCCTTTGGGGACCTCGACCAQAACCAGGATGGCTGCATCAGCAGGGAGGAGATGO 1800 
L S A FGD LDQ NQDGC I S R E 8M 599 

TTTCCTATTTCCTGCGCTCCAGCTCTGTGTTGGGGGGGCGCATGGGCTTCGTACACAACT 1860 
VSYFLRSSSVLGGRMGFVHN 619 
TCCAGGAGAGCAACTCCTTGCGCCCCGTCGCCTGCCGCCACTGCAAAGCCCTGATCCTGG 1920 
PQESNSLRPVACRHCKALIL 639 
GCATCTACAAGCAGGGCCTCAAATGCCGAGCCTGTGGAGTGAACTGCCACAAGCAGTGCA 1980 
GIYKQGLKCRACGVNCHKQC 659 
AGGATCGCCTGTCAGTTGAGTGTCGGCGCAGGGCCCAGAGTGTGAGCCTGGAGGGGTCTG 2040 
KDR LSVE CRRRAQ S"VS LEGS 679 

CACCCTCACCCTCACCCATGCACAGCCACCATCACCGCGCCTTCAGCTTCTCTCTGCCCC 2100 
AP S PSPMHS HHHRAFS FSLP 699 

GCCCTGGCAGGCGAGGCTCCAGGCCTCCAGAGATCCGTGAGGAGGAGGTACAGACGGTGG 2160 

RPGRRGSRPPE1REEEVQTV 719 

AGGATGGGGTGTTTGACATCCACTTGTAATAGATGCTCTGGTTGGATCAAGGACTCATTC 2220 
B D G V F D I H L * P v 72a 

CTGCCTTGGAGAAAATACTTCAACCAGAGCAGGGAGCCTGGGGGTGTCGGGGCAGGAGGC 2280 
TGGGGATGGGGGTGGGATATGAGGGTGGCATGCAGCTGAGGGCAGGGCCAGGGCTGGTGT 2340 
CCCTAAGGTTGTACAGACTCTTGTGAATATTTGTATTTTCCAGATGGAATAAAAAGGCCC 2400 
GTGTAATTAACCTTC (A) n 
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FIGURE 13(b) 



CGATTTCATTCCTCGCTCCCCACAGGTCCCTCTCCCCAAAATATTCCCATCTIXn'CCTAG 6 0 
CCCATCCCCCAGACTATCTCAAGGACCAGCTGTCCCCACX3CCCCCGACCrrCCACTAGGCC 120 
TGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGGGCCGGGTTAGCCCCATGGGAA vJ8 0 ' 

♦ p h g n 

CGG<^TTCGGTCCGAGCCCGGTGGGAGGCrCCCGGAGCGCAGCCTGGGCCCAGCCCACCC*2^0 
gvrsepggrlpersl gpa hp 

CGCGCCGGCGGC CATGG CAGGCACCCTGGACCTGGACAAGGGCTC 

a p a a N A G T L D LD KG C T V E E L 
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FIGURE 14 

1 MAGTLDLDKGC . . . TVEELLRGCIEAF . . DDSGKVRDPQLVRMFLMMHPW 45 

|.:.:: |. | ::|: |:|-| |:.:.|| ::| : : : I • I 

1 MSSKVEEDQHQELLTEDQLVARCVECFDlffiEEDE^ 50 

46 YIPSSQLAAKLLHIYQQSRKDNSNSLQVKTCHLVRYWISAFPAEFDLNPE 95 

. .| | ..::s:||:.|. : .|= |.:||. II -II -h 

51 LSDSLSLITHFVNFYQETRNVEQRE . . . AVCRAVSFWIEKFPMHFDAQPQ 97 

96 LAEQIKELKALLDQEGNRRHSSLIDIDSVPTYKWKRQVTQRNPVGQKK . . 143 

:..|: ||.: :: .:. :.:| |:..:|.: I I • I • 111 = = --. ^ 
98 VCAQWRLKTIAEDINENIRNGL . DVSALPSFAWLRAVSVRNPLAKQTIV 146 

144 RKMSLLFDHLEPMELAEHLTYLEYR 168 

:|| I : -I I - - = » I I 

147 RVDFETLPTPGTPPPFPIASKKFSLTAFSLSFVQASPSDISTSLSHIDYR 196 

169 SFCKI LFQDYHS FVTHGCTVDNPVLERFI SLFNSVSQWVQLMILSKPTAP 218 

:::| = :.. :|..| . hill I I = I I - - I - I I 1 lll-l-h- _ 
197 VLSRISITELKQYVKDGHLRSCPMLERSISVFNNLSNWVQCMILNKTTPK 246 

219 np & T.WTT HFVHVAEKLLQLON PNTT.MAVVGGI,SHSSTSRLKETHSHVSPE 268 

:|| :s..|||||..| .:.||||||.|llh.|lh-ll -I" : ' : ^ 
247 ZKJ&XINKJZD^^ 296 

269 TIKLWEGLTELVTATGNYGNYRRRLAAC . VGFRFPILGVHLKDLVALQLA 317 

. | :..||:|:-| h-lh hll I I : I h I I M M I I h • • _ 
297 IKKELTQLTNLLSAQHNFCEYRKALGACNKKFRIPIIGVHLKDLVAINCS 34G 

318 LPDWLDPARTRLNGAKMKQLFSILEELAMVTSLRPPV . QANPDLLSLLTV 366 

::: . :.:.|: .| .:|.:: • • : I lh- M 

347 GANFEKT. . KCISSDKLVKLSKLLSNFLVFNQKGHNLPEMNMDLINTLKV 394 

367 SLDQYQTEDELYQLSLQREPRSKSSPTSPTSCTPPPRPPVLEEWTSAAKP 416 

Ml . : I : : | : | | | . | | | : . . • I - I • I : • I I • I : • ■ ■ 

395 S LDI RYNDDDI YELSLRREPKTFMN FEPSRGLVFAEWASGVTV 437 

417 KLiXJALVVEHIEKMVESVFRNFD^^ 466 

|.| | .||. ||s.||:s:| | || Mllllhl lllh-M^ AQn 
438 APPNATVSKHISAMVDAVFKHYPHnFTOFISQEEFQLIAGNFPFIDAFVN 487 

467 T.nn wnnflrTSREEM VSYFLRSS . SVIXSGRMGF VWOFlffNiSLRPVACRHC 5 1 5 

: | : j | | | : : | : . | | : • • - • ! I • II I I I 5 I • • I I • • I • I I 
488 t n\mMTyy^T SKDEL KTYFMAANKNTKDLRR 537 

s 3 « ^Ji^fi^ 587 

566 PMHSHHHRAFSFSLPRPGRRGSRPPEIREEEVQTVEDGVFDIHL 609 

I . I : | : . : ...|| |.| : ..|:.. | -| 

588 PRGSMRSRI INTC NNSGSTPDEEIGLVSLACEEVFEDDDL 627 
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FIGURE 15 



human CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

human CCCATCCCCC AGACTATCTC AAOGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 120 

human TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCOGG CCOGGTTAGC CCCATGGGAA 180 

human CGCAGCGCCT GTGTGGCCQC GGGACTCAAG GCTGGCCTGG CTCAAGTGAA CAOCACGTCC 240 
mouse ** # tcag** **** a g**** t ********* *** a * g ***t> 

human AGGAGGCGAC CTCGTCCGCG GGTTTGCATT CTGGGGTQGA CGAGCTOOGG GTTCGGTCCG 300 

acagg 

I 

I 

mouse g**** # t**a **- # catt* # ********** «**aa**aa* g* # ct***** **a**aat**> 

human AGCCCGGTGG GAGQCTCCCG GAGCGCAGCC TGGGCCCAGC CCACCCOGCG CCGGCGGCC& 360 

mouse .#* a # t **## ****.** t g a *** t *f a*t ****t*t # * # ***-*tg**a •♦•**a****> 

human 3SGCAGGCAC CCTGGAOCTG GACAAGOGCT GCACGGTGGA GGACCTGCTC CGCGGGTGCA 420 

mouse **** ga **** t ********* ******** t * **** c ***** .*•**♦*#** *» t ## c ** t * > 

human TCGAAGCCTT CGATGACTCC GGGAAGGTGC GGGACCCGCA GCTGGTGOGC ATCTTCCTCA 480 

mouse •••#*♦#**# t ******** t ** a ******* * a * # t**a** *** a ****** *****t****> 

human TGATGCACCC CTOGTACATC CCCTCCTCTC AGCTGGCGGC CAAGCTGCTC CACATCTACC 540 

mouse •••••••••• ********* a •*£♦*****♦ ******* tt * g ** a ****** *#* t #*** t * > 

human AACAATCCCG GAAGGACAAC TCCAATTCCC TGCAGGTGAA AACGTGCCAC CTGGTCAGGT 600 

mouse * g ******** ********** .*****«* t * • a *«* a *##* ****** t *** t ****** *> 

human ACTGGATCTC CGCCTTGCCA GCGGAGTTTG ACTTGAACCC GGAGTTGGCT GAGCAGATCA 660 

mouse •**#*#•#*# a « ******** ♦• a **»«. c . •»••**«•«* a «*« c ***** ** a *******> 

human AGGAGCTGAA OGCTCTGCTA GACCAAGAAG GGAACCGACG GCACAGCAGC CTAATCGACA 720 

mouse ♦**♦**♦***. ******* t ** ********** ****»«* ca * ********** ** c ** > 

human TAGACAGCGT 730 

mouse * c ** g ** t ** 
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FIGURE 16 



GCTGCCCCTCCCAAGTTCCTCCCTGTTGGCCAGGCATCCAGGTCTCCAGTCTCCGAGCTG 180 
rCPSQVPPCWPG!QVSSLRA> 
CGGAGAACCCACCGCCACATGCGGCTGCCCCTTTCCATTCGACCCTGTGGGGAGCCAGGC 240 
avNPPPHA AAPFHSTLWGAR> 
TTCCGGGGCCCCGTTCCTCCTGTGTGAACTGGGCCCCCCGCCCCCATTCCCAGACATCAA 300 
r prPRSSCVNWAPRPHSQTS> 
GGCCGCGTCTCCAGATAGCCACGATTTCATTCCTCGCTCCCCACAGGTCCCTCTCCCCAA 360 
dpplOIATISFLAPHRSLSP> 
LtATTCCCATCTOTCCTAGCCCATCC-CCAGACTATCTCAAGGAC^ 420 
VYSHLVLAHPPDYLKDQLSP> 
GCCCCCGACCTCCACTAGGCCTGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGG 480 



R PRPPLGLCHPLPAGRRPVP> 

2 CCC ATGGGAAI 
p h g n 



GCCGGGTTAGCCCCATGGGAACGcagcgcc 



tgtgtggccgcgggactcaaggctggcctg 54 0 



r R V S "P M G T Q R L C G R G T Q G H P> 
G . * ^_^^ aacc tcatccQcgggtttgcattctggggtgg 600 



gctcaagtgaacagcacgtccaggaggcgacctcgtccgcgggtttgc, 

H \ 
TCGGTC 

CCCACCCCGCGCCGGCGGC^ 



gss eqhvqeatssaglhsg v> 

acgagctggGGGTTCGGTCCGAGCCCGGTGGG 

DELGVRSEPG G „.,.^ rirn r.TCa 720 



acgagctggGGGrTCGGTCCGAGCCCGG^ «« 



HPAPAAMAGT 



i j dldkgctv> 
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FIGURE 17 
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FIGURE 18 (Cont. I) 



Smal/Apal (both lost) 0.00 




pal/Smal (both lost) 1.00 



Plasmid name: clone 16 in pGEX-3X 
Plasmld size: 6.00 kb 
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FIGURE 18 (Cont. II) 




Plasmid name: clone 19 in pGEX-1 
Plasmid size: 6.00 kb 
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Hindlll 2.50 



Plasrnid name: clone 5 in pGEM-11zf 
Plaamid size: 5.50 kb 
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FIGURE 18 < Cont - IV) 




(!ost)/Sma! (lost) 2.40 



Plasmid name: clone 27 in pGEX-2T 
Plasmid size: 7.50 kb 
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FIGURE 19 



GCCCGCCGCC ATG CCG CCC TTA CTG CCC CTG CGC CTG TGC OGG CTG TGG 
Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp 



49 



1 



S 10 



CCC CGC AAC CCT CCC TCC CGG CTC CTC GGA GCG GCC GCC GGG CAG CGG 97 
Pro Arg Asn Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg 
xl 20 25 

TCC AGA CCC AGT ACT TAT TAT GAA CTG TTG GGG GTG CAT CCT GGT GCC 14 5 

Ser Arg Pro Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala 



30 



35 40 



AGC ACT GAG GAA GTT AAA CGA GCT TTC TTC TCC AAG TCC AAA GAG CTG 193 
Ser Thr Glu Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu 

55 feu 



50 



241 



CAC CCA GAC CGG GAC CCT GGG AAC CCA AGC CTG CAC AGC CGC TTT GTG 
His Pro Asp Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val 
65 70 

GAG CTG AGC GAG GCA TAC CGT GTG CTC AGC CGT GAG CAG AGC CGC CGC 
Glu Leu Ser Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg 
80 85 ' 

AGC TAT GAT GAC CAG CTC CGC TCA GGT AGT CCC CCA AAG TCT CCA CGA 
7S Tyr Asp Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg 
95 100 105 

ACC ACA GTC CAT GAC AAG TCT GCC CAC CAA ACA CAC AGC TCC TGG ACA 
Thr Thr Val His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr 
110 US 120 125 

CCC CCC AAC GCA CAG TAC TGG TCC CAG TTT CAC AGC GTG AGG CCA CAG 
So Pro £n aS Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin 
13 0 135 A * u 

GGG CCC CAG TTG AGG CAG CAG CAA CAC AAA CAA AAC AAA CAA GTG CTG 
SJ Pro £S Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu 
145 150 15b 

GGG TAC TGC CTC CTC CTC ATG CTG GCG GGC ATG GGC CTG CAC TAC ATT 529 
Tyr Cys 5u Leu Leu Met Leu Ala Gly Met Gly Leu H.s Tyr lie 

GCC TTC AGG AAG GTG AAG CAG ATG CAC CTT AAC TTC ATG GAT GAA AAG 577 
Ala Phe Arg Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys 
175 180 185 

GAT CGG ATC ATC ACA GCC TTC TAC AAC GAA GCC CGG GCA CGG GCC AGG 625 



289 



337 



385 



433 



481 
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FIGURE 19 (cont led) 

Asp Arg lie lie Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg 
190 " 195 200 205 

GCC AAC AGA GGC ATC CTT CAG CAG GAG CGA CAA CGG CTA GGG CAG CGG 673 
Ala .'.m Arg Gly He Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg 
210 215 220 

CAG CCG CCA CCA TCC GAG CCA ACC CAA GGC CCC GAG ATC GTG CCC CGG 721 
Gin Pro Pro Pro Ser Glu Pro Thr Gin Gly Pro Glu He Val Pro Arg 
225 230 235 

GGC GCC GGC CCC TGA GGGGCTC ACCTGGATGG GGCCTGCAGT GCGTTCCCGC 773 
Gly Ala Gly Pro * 
240 

TTTGCTTCCT TCCCTGGACG GCCCGCTCCC CGAAACGCGC GCAATAAAGT GATTCGCAG 832 
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FIGURE 20 

>sp|PC8622|DNAJ ECOLI DNAJ PROTEIN >pir||HHECOJ heat shock prctein dnaJ - 
Escherichia coli >gi| 145769 (M1I565 > -heat shock protein dnaJ 
{Escherichia coli) >gi|2l6441 (D104831 dnaJ protein (Escherichia 
coli] 

Length = 376 

Score = 138 163.7 bits). Expect * 1.2e-10. P = 1.2e-10 
Identities = 25/62 (40%) . Positives = 39/62 (62%) 

' YYE+LGV A E***A* ♦ ♦ HPttt* G* — T E+ EAY VL* Q R ♦ 

Sbjct: 6 YYEIU^SKTAEEREIRKAYTOI^M^ 65 

Query: 95 YD 96 
YD 

Sbjct: 66 YD 67 
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>gi|1703590 (U80439) contains similaricy to e. DNAJ-like domain {Caenorhabditis 
elegans) 
Length = 345 

Score = 98 (45.2 bits). Expect = 5.2e-12. Sum P(3) = 5.2e-12 
Identities = 17/37 (45%), Positives = 28/37 (75%) 

Query: 28 QRSI^STYYEUJC^PGASTEEvTa^AFFSKSXEU<P^ 64 

R TVYE+LGV A* E+K AF*+*SX*«-KPD 
Sbjct: 22 KKIRQRTHYEVU^/ESTATI^EIKi^ 58 

Score = 74 (34.1 bits). Expect = 5.2e-12, Sum P(3) = 5.2e-12 
Identities * 17/32 (53%), Positives = 19/32 (59%) 

Query: 71 SLHSRFVELSEAYKVLSREQSRRSYDDQLR5G 102 

S ♦ F*EL AY VL R RR YD QLR C 
Sbjct: 64 SATASFLELKNAYWLRR PADRRLYDYQLRGG 95 

Score = 39 (18.0 bits). Expect = 5.2e-12. Sum P(3) = 5.2e-12 
Identities = 10/42 (23%), Positives = 19/42 (45%) 

Query: 162 LL241J^QCLHYIAFRKvl<QMHIJiFMDEKDRI ITAFYNEARAR 203 

L***AG Y* Q ♦ I . F ♦ R 

Sbjct: 158 LVXVAGYNGGYLYLLAYNQKQLDKL IDEDE LAKCFLRQKEFR 199 



>gnl|PID|e281266 (Z81030) C01G10.12 [Caenorhabditis elegans) 
Length =191 

Score = 96 «(44.3 bits). Expect = 1.8e-09. Sura P(3) = l.Be-09 
Identities = 17/41 (41%), Positives « 27/41 (65%) 

Query: 35 YYELI/3VWGASTEEviaiAFFSKSKEIiiPDRI)PC2^PSIJiSR 75 

YYE++GV A* +E+* AF K+K+LHPIH ♦ SR 
Sbjct: 19 YYE I IGVSASATRQE I RDAFLKKTKQLHPDQSRKS SKSDSR 59 

Score = 54 (24.9 bits). Expect = 1.8e-09. Sum P(3) = 1.8e-09 
Identities = 10/22 (45%). Positives = 15/22 (68%) 

Query: 75 RFVELSEAYRVLSREQSPPSYD 96 

♦F+ ♦ EAY VL E* R* YD 
Sbjct: 71 QFMLVT^EAYDvT-W!EEKRKEYD 92 

Score = 35 (16.1 bits). Expect = l.Be-09. Sum P(3) = 1.8e-09 
Identities = 9/44 (20%). Positives = 22/44 (50%) 

Query- 141 QGPQU^HKQNKCAn^ 184 

♦ P+ KQ **L **A *G ♦ ♦ RK** L* 

Sbjct- 145 RNPEDEYXABCQIWHMLV^^ 188 
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FIGURE 22 

> S p|Ql0209|YAYl„SCHPO HYPOTHETICAL 44.8 KD PROTEIN C4H3.01 IN CHROMOSOME I 
>gi | 1184014 (Z69380) unknown (Schizosaccharontyces pombel 
Length = 392 

Score = 84 (38.8 bits), Expect = 4.1e-08. Sum P(3) = 4.1e-08 
Identities = 13/35 (36%). Positives = 25/36 (69%) 

Query 35 YYZIXGVHPGASTEEVKB^ 70 

YY+LLO A* — K*A* ♦ «■ HPD**P *P 
Sbjct: 9 YYDLLG ISTDATAVDIKKAYRKLAVKYHPDKNPDDP 44 

Score = 64 (29.5 bits), Expect = 4.1e-08, Sum P(3) = 4.1e-08 
Identities = 14/40 (35%), Positives = 23/40 (57%) 

Query- 75 RFVELSEAYKVXSREQSRRSYDDQLRSGSPP^PPTTVHD 114 

*F ++SEAY+VL E«- R YD ♦ ♦ P* T *D 
Sbjct: 50 KFQK I SEAYQVLGDEKIJISQYDQFGKEKAVPEQGFTDAYD 89 

Score = 37 (17.1 bits), Expect = 4.1e-0B, Sum P(3) = 4.1e-08 
Identities = 9/29 (31%), Positives = 15/29 (51%)^ 

Query: 190 DRIITAFYNEARARARANRGILQQERQRL 218 

DR A E A A+ «- RQR* 
Sbjct: 149 DRKKNAQIREREALAKREQEMIEDRRQRI 177 

Score = 33 (15.2 bits). Expect = 0.00081. Sum P(3) = 0.00081 
Identities = 8/19 (42%). Positives = 11/19 (57%) 

Query: 140 PQGPQLRQQQHKQNKQVLG 158 

PQG * Q+ * QVLG 
Sbjct: 44 PQGASEKFQKISEAYQVLG 62 



FIGURE 23 

>™i lPTniA253406 (X77635) tumorous imaginal discs (Drosophila virilisl 
> ^ l|PID,e >^I|pS|e263866 (Y07700) Tid58 protein (Drosophila vir ills] 
Length = 529 

Score = 153 (70.6 bits). Expect = 9 -7e-13 ; P » J; 7 *" 11 
Identities = 27/71 (38%). Positives = 44/71 (61%) 

Sbjct: 72 SSSiUaKDYYATLOTAKNA^ 131 

Query: 86 LSREQSRRSYD 96 

LS +Q RR TO 
Sbjct: 132 LSDDQKRRETO 142 
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KCG18 MPPLLPLRLCRLWP-RN--PP SRLLGAA 

HDJ-2 MVKZTTYYUVbGVK PNATQEELKKAYKiOALKYH PDKN - - PN EGEKFKQ ISQAYEV 

HDJ- 1 MGKD - - YYQTLCLARGA5DEEI KPAYFRQALRYHPDKNKEPG AEEKFKEIAEAYW 

HSJ1 M-^--YYEILDVPRSASADDIiOCAYRRKAl^PDKN--P^ 



MCG 1 6 AGQRSRPSTY- - YE I, LOTH PGA ST- EEVKRAFFS - - 

HDJ-2 LSDAKKREL YDKGGEQA IK EGGAGGG PGSPMDIFT3 HFFG QG 

HDJ- 1 LSDPRKREIFDRYGEB3LKGSGP SGG9GGGANGTSFSYTFHGDPHAMFAEFFG — 

HSJl LSDKHKREIYDRYGREGLTGTGTGPSRAEAGSGGP- -G--FTFT-FRSPEEVFREFPG-- 



MCG18 KSKELHPDRDPCNP SLHSRFVELSEAYRVLSRBQSRRS--YDOQLRSGSPPKSPRT 

HDJ-2 GRMQRERRGKNNA/HQLSVTLroL YNGATRKLALQKNVI CDKCBC 

HDJ- 1 GRNPFTTrFTXXJRtGEEGMDIDDPFSGFPW^^ 

HSJl SGDPFAELFDDLGP- -FSELQNRGSRHSGPFFTFSSSFPGHSDFSSSSFSFSPGAGAFRS 



MCG 18 TVHDKSAHQTHSSWTPPNAQY WSQFHSVRPQ GP QLRQQQHKQN 

HDJ-2 TOIQIRIHQIGPCWVQQIQSVCM^^ 

HDJ-1 HDLRVSLEE rYSGCTKKMK ISH-KRLNP — D GKSIRNEDKILTIEVKK 

HSJl VSTSTTFVQGRRITTRRIME NGQ-ERVEVEED GQ LKSVTIMGVPD 



MCG 18 KQVLGYCLLL MLAGMGLHY IAFRKVKQMHLKFKDE - KDRI ITAFYNEARARARAN 

HDJ-2 GMKDGQKITFHGEEDQEFGLEPGDI I IVLDQKDHAVTTRRGEDLFJC1©IQLVEALCGFQ 

HDJ- 1 GWKEGTK ITFPKBGDQTSNNI PADIVFV1J0DKPHNIFKRDGSDVIYPARISLREALCGCT 

HSJl DLARGLELSR- RE- -QQP-SVTSRSGGTQVQQTPASCPLD- SDLSEEEDLQLAMAYSLSE 



MCG18 RGILQQERQRLGQRQPP-PSEPTQGPEIVPRGAGP 

HDJ-2 KPISTIXEOTIVITSHPGQIVKHGDIKCVt^^ 

HDJ- 1 VNVPTLDGRTI PWFK - - DVTRPGMRRKVPGEGLPLPKTPEKRGDLI IEFEVIFPER- - 1 

HSJl KEAAGKKPAQGREAQHR-RQGRPRPSTOIQAW3GP- -RR — VRG — VKQPNAVHPQR-RR 



MCG 18 

HDJ-2 SPDKLSU^XLLPERKEVEETDQIDQVEL 

HDJ-1 PC/TSRTVLEQVLPI 

HSJl PLAASSSEHRAQPD LIQILTGGSDSLWEEKRGVS 

MCG 18 

HDJ-2 QTS 



HDJ-1 
HSJl 



= amino acid identity in all 4 proteins 
s conservative substitution 
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FIGURE 25 



CAAGGAGCCTCTGCCTGCCCGTCGTCGTCATGCCGTCCCTGTTGCTCCAGCTGCCCCTGC 60 

MPSLLLQLPL 10 
GCCTATGCCGGCTGTGGCCGCATAGCCTTTCCATCCGACTTCTCACAGCCGCCACAGGQC 120 

RLCRLWPHSLS I RLLTAATG 30 
AGCGGTCTGTCCCTACTAATTACTATGAATTGTTGGGCGTGCATCCGGGTGCCAGCGCTG 180 

QRSVPTNYYEL LGVHPGASA 50 
AAGAGATTAAACGTGCTTTTTTCACCAAGTCAAAAGAGCTACACCCTGATCGAGACCCTG 240 
EEIKRAFFTK SKELHPDRDP 70 
GGAACCCAGCCCTGCATAGCCGCTTTGTGGAGCTGAATGAGGCATATCGAGTGCTCAGTC 300 
GNPALHSRFVELNEAYRVLS 90 
GTGAGGAAAGTCGTCGTAACTATGACCACCAGCTGCATTCAGCCAGTCCTCCAAAGTCTT 360 

RE E S RRNYDHQ LHSASPPK S 110 
CAGGGAGCACAGCCGAGCCTAAGTATACGCAACAGACACACAGCAGCTCCTGGGAACCCC 420 

S G S 
CCAACGCTCAAT? 

. . ^uucVRPOGPESR 150 
PNAQ.YWAQFHSVRi'Uur=. 

AGCAGCAGCC 

KQQRKHMQRVLGYCLLLMVA X70 

GCATGGGCCTGCACTAT 

G M G L HYVAFRK L EQVHRSFM 190 
ATGAAAAGGACCGGATCATTACAGCCATCTACAATGACACTCGGGCCAGGGCCAGGGCCA 660 
D E K D R I I T A I Y N D T R A R A R A 210 
ACAGAGCCAGGATTCAGCAGGAGCGCCArGAGAGGCAGCAGCCTCGGGCAGAACCCTCCC "720 

NRAR I QQERHE RQQPRAEP 230 
TGCCTCCAGAAAGCTCCAGGATCATGCCCCAGGACACAAGCCCCTGAGAGGCTTAACTAA 780 

LPPESSRIMPQDTSP* 245 
ATGGGACCTTCATTGGTCCTCTCCCTGCTGCCTGTCCAGAACTACACGTGCAATAAACTC 840 

849 

ATTTTCAG(A)n 



TAEPKYTQQTHSS SWEP 130 
rACTGGGCCCAGTTCCACAGTGTGAGGCCGCAGGGGCCGGAGTCAAGGA 480 

YWA Q FH S VRPQGPESR 
rGTAAACACAACCAGCGGGTCCTGGGGTACTGCCTCCTGCTCATGGTGGCAG 540 
1NQRVLGYCLLLMVA 
\TGTTGCCTTCAGGAAGCTGGAGCAGGTGCATCGCAGCTTCATGG 600 
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FIGURE 26 
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human M0G18 
mouse MOG18 



human MCG18 
MOG16 



human MCG18 
mouse MCG18 



human MCG18 
mouse MSG 18 



human MTG18 
mouse MOG18 



MPPLL — - PIJU^nU^ffRNPPSRIXGAAAGQRSR^ 

Kfp gr j .t .pt .rt /*BT JWPHSLS TRI^TAATGQRSVPTTTYYELI^VHPGASAEEIKRftFFTK 
** •* »•**•*•*• •#* * 

SKELHPDRDPGNPSIJISRFVELSEAYKVLSF^ 
SKIIJffDRDPGNPAUlSRF^reL^^ 

•»•#*•••*•*** ••##*••** **• *• ** * • •••• ^* * 

HOTHSS-WPPf^YWSQFHSVRPQGPQLRQQQHKQ^ 
QtfTHSSSWEPPNAQYWAQFttSVRP^ 

KVKQMHLNFMDEKDRI ITATYNEARAKARANRGILQQERQRLGQRQPPPSEPTQGPE- - - 
KIJEX^RSFMDEKDRI ITAIYNOTRARARANHARIOQER - - - HERQQPRAEPSL PPESSR 
• * • *********** ••*••**••* **#* ( ** * ** 

IVPRGAGP 
IMPQOTSP 
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FIGURE 27 



ttgaagtctagccccatcctggtccaatgcgctcttggtagcctcctttcccagctgccc 60 
♦SLAPSWSNALLVASFPSCP 

gcccgccgccATGCCGCCCTTACTGCCCCTGCGCCTGTGCCGGCTGTGGCCCCGCAACCC 120 
PAAMPPLLPLRLCRLWPRNP> 
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1 2 3 4 5 




-28S 

-18S 
- MCG18 



FIGURE 28 
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THREE NOVEL GENES ENCODING A ZINC FINGER PROTEIN, A GUANINE, NUCLEOTIDE EXCHANGE FACTOR 
AND A HEAT SHOCK PROTEIN OR HEAT SHOCK BINDING PROTEIN 

FIELD OF THE INVENTION 

5 The present invention relates generally to a novel human gene and its derivatives and to 
mammalian, animal, insect, nematodes, avian and microbial homologues thereof. The present 
invention further provides pharmaceutical compositions and diagnostic agents as well as genetic 
molecules useful in gene replacement therapy and recombinant molecules useful in protein 
replacement therapy. 

10 

BACKGROUND OF THE INVENTION 

Bibliographic details of the publications referred to by author in this specification are collected 
at the end of the description. 

15 

The increasing sophistication of recombinant DNA technology is greatly facilitating research and 
development in the medical and allied health fields. There is growing need to develop 
recombinant and genetic molecules for use in diagnosis and in conventional pharmaceutical 
preparations as well as in gene and protein replacement therapies. 

20 

In work leading up to the present invention, the inventors sought to identify and clone human 
genes which might be useful as potential diagnostic and/or therapeutic agents. Molecules of 
particular interest targeted by the inventors were gene regulators including regulatory proteins, 
signal transducers and heat shock proteins. 

25 

Gene expression generally requires interaction between a regulatory protein and an appropriate 
recognition sequence of a target gene. Regulatory proteins comprise in many cases a domain or 
motif which facilitates binding to DNA, One particular motif comprises small sequence units 
repeated in tandem with each unit folded about a zinc atom to form separate structural domains. 
30 This motif is now referred to as a zinc finger domain. Such a domain is generally defined by the 
number of cysteine (C) and histidine (H) residues. 
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In addition, knowledge of cellular interaction in the control of cell proliferation is essential in the 
rational design of specific therapeutic strategies aimed at controlling proliferative disorders. 
Such proliferative disorders including a range of cancers, inflammatory conditions and 
atherosclerosis. An important aspect of cellular interaction is in signal transduction via receptors 
5 to intracellular transducers. One key signal transducer is Ras which couples the receptors for 
diverse extracellular signals to different effectors. Ras directly activates the downstream kinase 
Raf which in turn induces the mitogen activated protein kinase (MAPK) cascade. 

Another regulatory mechanism involves heat shock proteins. The Escherichia coli heat shock 
10 protein, DnaJ, is the founding member of a family of proteins which are associated with protein 
folding, protein complex assembly and transit through subcellular components. 

Prokaryotic and eukaryotic DnaJ homologues have a modular organisation consisting of a J 
domain, a glycine-rich spacer, CXXCXGXG [SEQ ID NO: 1] repeats and a C-terminal region 
15 with no obvious sequence features, as well as additional sequences for protein targeting. The 
J domain is anticipated to mediate interaction with heat shock 70 proteins (Hsp70) and consists 
of some 70 amino acids, frequently located at the N-terminus of the protein. 

In accordance with the present invention, a genes have been identified from the human genome 
20 which encodes proteins having a regulatory role. One gene, in accordance with the present 
invention encodes a protein with an N-terminal region resembling a zinc-finger domain of a novel 
type. Another gene encodes a protein involved in guanine nucleotide exchange factor (GEF) 
signalling pathways. Yet another gene encodes a protein which is a heat shock protein or heat 
shock-like protein which may have a role in tumour suppression. 

25 

SUMMARY OF THE INVENTION 

Throughout this specification, unless the context requires otherwise, the word "comprise", or 
variations such as "comprises" or "comprising", will be understood to imply the inclusion of a 
30 stated element or integer or group of elements or integers but not the exclusion of any other 
element or integer or group of elements or integers. 
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Sequence identity numbers (SEQ ID NOs.) for nucleotide and amino acid sequences referred to 
in the subject specification are defined after the bibliography. A summary of SEQ ID NOs. is 
also given in Table 1 . 

5 One aspect of the present invention contemplates an isolated nucleic acid molecule comprising 
a sequence of nucleotides encoding or complementary to a sequence encoding an amino acid 
sequence having homology to a regulator of gene expression or a derivative of said gene 
regulator. 

10 Another aspect of the present invention provides an isolated nucleic acid molecule comprising 
a sequence of nucleotides encoding or complementary to a sequence encoding a regulator of 
gene expression wherein said regulator comprises a zinc finger domain of an (HC 3 ) 2 type. 

Yet another aspect of the present invention is directed to an isolated nucleic acid molecule 
15 comprising a sequence of nucleotides or a complementary form thereof selected from: 

a nucleotide sequence set forth in SEQ ID NO:2; 

a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 
a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

The nucleotide sequence set forth in SEQ ID NO:2 defines the gene, mcg4. This gene encodes 
25 a product, MCG4, having an amino acid sequence set forth in SEQ ID NO:3. 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg4 
gene portion, which mcg4 gene portion is capable of encoding an MCG4 polypeptide or a 
30 functional or immunologically interactive derivative thereof. 



(i) 
(ii) 
(iii) 

20 

(iv) 
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Still yet another aspect of the present invention contemplates a method of detecting a condition 
caused or facilitated by an aberration in mcg4, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcg4 wherein the presence of such a nucleotide substitution, deletion 
5 and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcg4, said method comprising screening for a single or 
10 multiple amino acid substitution, deletion and/or addition to MCG4 wherein the presence of such 
a mutation is indicative of or a propensity to develop said condition. 

Another aspect of the present invention contemplates a method for detecting MCG4 or a 
derivative thereof in a biological sample said method comprising contacting said biological 
15 sample with an antibody specific for MCG4 or its derivatives or homologues for a time and under 
conditions sufficient for an antibody-MCG4 complex to form, and then detecting said complex. 

A further aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
20 amino acid sequence having homology to a guanine nucleotide exchange factor (GEF) or a 
derivative thereof. 

Yet another aspect of the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

25 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 
or 7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
30 of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions to the 



WO 98/53061 



PCT/AU98/00380 



-5- 

nucleotide sequence set forth in (i), (ii) or (iii). 

The nucleotide sequence set forth in SEQ ID NO:4 or 6 defines the gene, mcg7. This gene 
encodes a product, MCG7, having an amino acid sequence set forth in SEQ ID NO:5 or 7. 

5 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg7 
gene portion, which mcg7 gene portion is capable of encoding an MCG7 polypeptide or a 
functional or immunologically interactive derivative thereof. 

10 

Still yet another aspect of the present invention contemplates a method of detecting a condition 
caused or facilitated by an aberration in mcg7, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcg7 wherein the presence of such a nucleotide substitution, deletion 
1 5 and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in meg 7, said method comprising screening for a single or 
20 multiple amino acid substitution, deletion and/or addition to MCG7 wherein the presence of such 
a mutation is indicative of or a propensity to develop said condition. 

Another aspect of the present invention contemplates a method for detecting MCG7 or a 
derivative thereof in a biological sample said method comprising contacting said biological 
25 sample with an antibody specific for MCG7 or its derivatives or homologues for a time and under 
conditions sufficient for an antibody-MCG7 complex to form, and then detecting said complex. 

Yet another aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
30 amino acid sequence having homology to a heat shock protein or a heat shock binding protein 
or a derivative thereof. 
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Another aspect of the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

5 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 4 1 °C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

10 

The nucleotide sequence set forth in SEQ ID NO:8 defines the gene, mcglS. This gene encodes 
a product, MCG18, having an amino acid sequence set forth in SEQ ID NO:7. 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
15 portion and an animal, more particularly a mammalian and even more particularly a human 
mcgl8 gene portion, which mcgl8 gene portion is capable of encoding an MCG18 polypeptide 
or a fiinctional or immunologically interactive derivative thereof. 

Still yet another aspect of the present invention contemplates a method of detecting a condition 
20 caused or facilitated by an aberration in mcgl8, said method conprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcgl8 wherein the presence of such a nucleotide substitution, deletion 
and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

25 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcgl8, said method comprising screening for a single 
or multiple amino acid substitution, deletion and/or addition to MCG18 wherein the presence of 
such a mutation is indicative of or a propensity to develop said condition. 

30 

Another aspect of the present invention contemplates a method for detecting MCG18 or a 
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derivative thereof in a biological sample said method comprising contacting said biological 
sample with an antibody specific for MCG18 or its derivatives or homologues for a time and 
under conditions sufficient for an antibody-MCG18 complex to form, and then detecting said 
complex. 

A summary of SEQ DD Nos. referred to in the subject specification is shown in Table 1. 
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TABLE 1 
SUMMARY OF SEQ ID Nos. 



SEQ ID NO. DESCRIPTION 



15 



1 


amino acid repeat sequence in DnaJ homologues 


2 


Nucleotide sequence of mcg4 




amino acid sequence of MCG4 


4 


nucleotide sequence of mcg7 


J 


amino acid sequence of MCG7 


A 
u 


nucleotide sequence of mcg7 within exon of 




nucleotides 183-288 


7 


amino acid sequence of MCG7 within exon of 




nucleotide 183-288 


8 


nucleotide sequence of mcgl8 


9 


amino acid sequence of MCG18 


10-18 


amino acid sequence identified using BESTFTT 


19 


sequence of pGEX and meg 7 junction 


20 


sequence of pGEX and meg 7 junction 


21 


nucleotide sequence of myc-tag/mcg7 junction 


22 


amino acid sequence corresponding to SEQ ID NO:21 


23 


nucleotide sequence of pGEX and mcg7 junction 


24 


amino acid sequence corresponding to SEQ ID NO:23 


25-36 


meg 7-specific oligonucleotide 


37-45 


mcgiS-specific oligonucleotide 



25 Single and three letter abbreviations for amino acid residues are shown in Table 2. 
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TABLE 2 



Amino Acid Three-letter One-letter 

Abbreviation Symbol 
5 



Alanine 


Ala 


A 


Arginine 


Arg 


R 


Asparagine 


Asn 


N 


Aspartic acid 


Asp 


D 


10 Cysteine 


Cys 


C 


Glutamine 


Gin 


Q 


Glutamic acid 


Glu 


E 


Glycine 


Gly 


G 


Histidine 


His 


H 


15 Isoleucine 


De 


I 


Leucine 


Leu 


L 


Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


20 Proline 


Pro 


P 


Serine 


Ser 


S 


Threonine 


Thr 


T 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 


25 Valine 


Val 


V 


Any residue 


Xaa 


X 



30 
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BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 is a representation of the nucleotide sequence [SEQ ID NO:2] and corresponding 
amino acid sequence [SEQ ID NO:3] of mcg4. 

5 

Figure 2 is a representation of the alignment of the human MCG4 amino acid sequence with a 
translation of a partial murine expressed sequence tag (EST). 

Figure 3 is a representation of the alignment of the human MCG4 amino acid sequence with a 
10 translation of a partial nematode EST. 

Figure 4 is a diagrammatic representation showing a predicted structure of MCG4 where H and 
C represent histidine and cysteine residues, respectively and X refers to any amino acid residue. 
Zn represent zinc atoms. 

15 

Figure 5 is a representation of sensitive sequence homology search of related cysteine-containing 
motifs in another Caenorhabditis elegans protein. 

Figure 6 is a representation showing that a related cysteine containing motif is present in the 
20 GATA-binding transcription factor from Saccharomyces pombe. 

Figure 7 is a Northern blot showing expression of mcg4 in various cultured human cancer cell 
lines. Lanes 1-5, respectively, represent the hybridization signal from 15/zg total RNA derived 
from various human cancer cell lines. Lanes 1-5, respectively, contain RNA from H69 lung 
25 carcinoma cells, JAM ovary carcinoma cells, BT20 breast carcinoma cells, HaCat transformed 
keratinocytes, T24 bladder carcinoma cells. 

Figure 8 is a representation of a partial alignment of mcg4 with human ESTs AA074703 and 
AA134788. 

30 

Figure 9 is a representation of the partial nucleotide sequence alignment between a human 
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(W32939) and mouse (AA242159) mcg4-likc EST in the putative 5' UTR of the mcg4 cDNA. 
The putative initiation codon is underlined and the region upstream represents 5' UTR. 

Figure 10 is a representation showing Mac Vector alignment of MCG4 with forward translations 
5 ofESTs AA 134788 and AA074703. The nucleotide sequences are shown in Figure 8. 

Figure 11 is a diagrammatic representation of the domains of MCG4 

zinc finger consensus: CX 2 HX 4 CX 2 CX 4 HX 2 CX 17 CX 2 CX I8 HX 2 CX I8 CX 2 C 
acidic domain consensus: 9/34 amino acids negatively charged, 0/34 positively charged 
10 basic domain consensus: 13/55 amino acids positively charged, 0/55 negatively charged 
leucine zipper domain consensus: LX 6 LX 6 RX 6 LX 6 L 

alternate "novel" leucine zipper-like motif where leucine would not be aligned along the one 
surface of an alpha he!ix domain: (aa261) LX 6 LXLX 6 LXLX 6 L (aa 286). 

15 Figure 12 is a representation showing similarity of MCG7 with GEFs of various organisms. 

Figure 13(a) is a representation of the nucleotide sequence [SEQ ID NO:4] and corresponding 
amino acid sequence [SEQ ID NO:5] of mcg7. Nucleotides 183-288 are an alternative spliced 
exon (shown in lower case). 

20 

Figure 13(b) is a representation of the partial nucleotide sequence [SEQ ID NO:6] and 
corresponding amino acid sequence [SEQ ID NO:7] of mcg7 but without the exon shown in Fig. 
13(a). Amino acids have been numbered from the first methionine codon (underlined). The 
cDNA molecules of Fig. 13(a) and Fig. 13(b) differ by the inclusion and exclusion of the exon 
25 of nucleotides 183-288. 

Figure 14 is a representation showing a comparison between MCG7 and a homologue from 
Caenorhabditis elegans using the BESTFTT algorithm, in the figure, the following sequences 
are underlined: 



EF-Hand= PROSITE DATABASE NO. PD0C00018 
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la nematode DVDEEDEVEDIEF [SEQ ID NO: 10] 

lb human DVDGDGHISQEEF [SEQ ID NO: 1 1 ] 

nematode DHDRDGFISQEEF [SEQ ID NO: 12] 

lc human DQNQDGCISREEM [SEQ ID NO: 13] 

5 nematode DVDMDGQISKDEL [SEQ ID NO: 14] 



GUANINE NT BINDING REGION = BLOCKS DATABASE NO. BL00720B 

2 human HFVHVAEKLIJJLQNFNTLMA WGGLSHSSISRLKETH [SEQ ID NO: 1 5] 
nematode KFVHVAKHLRKINNFNTLMSVVGGITHSSVARLAKTY 

10 [SEQ ID NO: 16] 

DaG-PE BINDING DOMAIN = PROSITE DATABASE NO. PDOC00379 

3 human HNFQESNSLRPVACRHCKALILGIYKQGLKCRACGVNCHKQCKDRLSVEC 

[SEQ ID NO: 17] 

15 nematode HNFHETTFLTPTTCNHCNKLLWGILRQGFKCKDCGLAVHSCCKSNAVAEC 
[SEQ ID NO: 18] 

Figure 15 is a representation of an alignment of human and a partial (5 ' UTR and partial coding 
sequence) murine mcgl cDNA (GenBank Acc. No. W71787 and AA237373). The putative 
20 initiation codon is underlined. The murine sequence represents a composite of 2 partial cDNA 
sequences from the EST database (accession numbers W71787 and AA237373). Nucleotide 
differences between human and murine sequences are shown in lower case lettering and identical 
residues are indicated with asterisks. 

25 Figure 16 is a representation of further 5' nucleotide and corresponding amino acid sequence for 
human mcgl. Nucleotide positions 1-321 were derived from GenBank Acc. No. AC000134 and 
nucleotides 322 onwards from Fig. 13(a). Two in-frame initiation codons are underlined. 
Asterisks denote in-frame stop codons. 

30 Figure 17 is a graphical representation of a GDP release assay. □ Experiment #1 (mean of 
duplicates). 0 Experiment #2 (mean of duplicates). The exchange reaction contained 36pmols 
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of GST-MCG (N-terminally truncated; encoded by Construct B in Fig. 18) and 1.6-12.8 pmols 
of recombinant GST-N-Ras.GDP. Reaction time 6 mins. 
Estimated reaction constants: 
1^ = 2. ljiM, = 37pMol/6min/36pMol [Expt#l] 
5 1^=1 .5mM, = 30.3pMol/6 min/36pMol [Expt#2] 

Figure 18 depicts various recombinant plasmids containing partial or full-length mcg7. 

Figure 19 is a representation of the nucleotide sequence [SEQ ID NO:8] and corresponding 
10 amino acid sequence [SEQ ID NO:9] of mcgl8. 

Figure 20 is a representation showing that MCG18 has partial homology to E. coli DnaJ. 

Figure 21 is a representation showing that MCG18 has homology to two Caenorhabitis elegans 
15 proteins. 

Figure 22 is a representation showing that MCG18 has homology to a Saccharomyces pombe 
protein. 

20 Figure 23 is a representation showing homology of MCG18 to a Drosophila virilis protein. 

Figure 24 is a representation showing homology of MCG18 to human DnaJ proteins HDJ- 
2/HSDJ, HDJ-1/HSP40 and HSJ1. 

25 Figure 25 is a representation of the nucleotide and corresponding amino acid sequence of murine 
mcgl8. 

Figure 26 is a representation of homology between human and murine MCG18. 

30 Figure 27 depicts nucleotide sequences corresponding to the 5' untranslated region of human 
mcgl8. 
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Figure 28 depicts a Northern blot showing expression of meg 18 transcripts in total RNA isolated 
from various human cancer cell lines grown in culture. Lanes 1-5 respectively contain 15/zg 
RNA from H69 lung carcinoma cells, JAM ovary carcinoma cells, BT20 breast carcinoma cells, 
HaCat transformed keratinocytes, T24 bladder carcinoma ceils. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

The present invention provides an isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding an amino acid sequence having 
5 homology to a regulator of gene expression or a derivative of said gene regulator. 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
regulator of gene expression wherein said regulator comprises a zinc finger domain of an (HC 3 ) 2 
10 type. 

Still more particularly, the present invention provides an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

15 (i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
20 to the nucleotide sequence set forth in (i), (ii) or (iii). 

The present invention also provides an isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding an amino acid sequence having 
homology to a guanine nucleotide exchange factor (GEF) or a derivative thereof. 

25 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

30 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 

or 7; 
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(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

5 

Another aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
amino acid sequence having homology to a heat shock protein or a heat shock-binding protein 
or a derivative thereof. 

10 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

15 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

20 

Preferably, the percentage similarity is at least about 50%. More preferably, the percentage 
similarity is at least about 60%. 



Reference herein to a low stringency at 42 °C includes and encompasses from at least about 1% 
25 v/v to at least about 15% v/v formamide and from at least about 1M to at least about 2M salt for 
hybridisation, and at least about 1M to at least about 2M salt for washing conditions. Alternative 
stringency conditions may be applied where necessary, such as medium stringency, which 
includes and encompasses from at least about 16% v/v to at least about 30% v/v formamide and 
from at least about 0.5M to at least about 0.9M salt for hybridisation, and at least about 0.5M 
30 to at least about 0.9M salt for washing conditions, or high stringency, which includes and 
encompasses from at least about 31% v/v to at least about 50% v/v formamide and from at least 
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about 0.01M to at least about 0. 15M salt for hybridisation, and at least about 0.01 M to at least 
about 0.1 5M salt for washing conditions. 

The term "similarity" as used herein includes exact identity between compared sequences at the 
5 nucleotide or amino acid level. Where there is non-identity at the nucleotide level, "similarity" 
includes differences between sequences which result in different amino acids that are nevertheless 
related to each other at the structural, functional, biochemical and/or conformational levels. 
Where there is non-identity at the amino acid level, "similarity" includes amino acids that are 
nevertheless related to each other at the structural, functional, biochemical and/or conformational 
10 levels. 

The present invention extends to nucleic acid molecules with percentage similarities of 
approximately 65%, 70%, 75%, 80%, 85%, 90% or 95% or above or a percentage in between. 

15 The nucleic acid molecule of the present invention defined by SEQ ID NO:2 is hereinafter 
referred to as constituting the "mcg4" gene. The protein encoded by mcg4 is referred to herein 
as "MCG4"and has an amino acid sequence set forth in SEQ ID NO: 3. The mcg4 gene is 
proposed to encode, in accordance with the present invention, a regulator of gene expression and 
comprises a novel zinc finger domain, (HC 3 ) 2 . A regulator of gene expression includes a 

20 transcription factor. Regulation may be at the level of nucleic acidrprotein or protein:protein 
interaction. 

The nucleic acid molecule of the present invention defined by SEQ ID NO:4 or 6 is hereinafter 
referred to as constituting the "meg 7" gene. The protein encoded by mcg7 is referred to herein 
25 as "MCG7" and has an amino acid sequence set forth in SEQ ID NO:5 or 7 and is involved in 
signal transduction. The difference in the nucleotide and amino acid sequence is due to the 
presence or absence of an exon at nucleotides 183-288. 

The nucleic acid molecule of the present invention defined by SEQ ID NO: 8 is hereinafter 
30 referred to as constituting the "mcgl8" gene. The protein encoded by mcgl8 is referred to 
herein as "MCG18" and comprises the amino acid set forth in SEQ ID NO:9. 
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The present invention extends to the naturally occurring genomic mcg4, mcg7 and mcgl8 
nucleotide sequences or corresponding cDNA sequences or to derivatives thereof. Derivatives 
contemplated in the present invention include fragments, parts, portions, mutants, homologues 
and analogues of MCG4, MCG7 or MCG8 or the corresponding genetic sequences. Derivatives 

5 also include single or multiple amino acid substitutions, deletions and/or additions to MCG4, 
MCG7 or MCG18 or single or multiple nucleotide substitutions, deletions and/or additions to 
meg4, mcg7 or mcgl8. "Additions" to the amino acid or nucleotide sequences include fusions 
with other peptides, polypeptides or proteins or fusions to nucleotide sequences. Reference 
herein to "MCG4" or "mcg4'\ "MCG7" or "mcg7" or "MCG8" or mcgl8" includes reference to 

10 all derivatives thereof including functional derivatives and immunologically interactive derivatives 
of MCG4, MCG7 or MCG18. 

The mcg4 t mcg7 and meg 18 of the present invention are particularly exemplified herein from 
humans and in particular from human chromosome 1 lql3. 

15 

The present invention extends, however, to a range of homologues from, for example, primates, 
livestock animals (eg. sheep, cows, horses, donkeys, pigs), companion animals (eg. dogs, cats) 
laboratory test animals (eg. rabbits, mice, rats, guinea pigs), reptiles, birds (eg. chickens, ducks, 
geese, parrots), insects, nematodes, eukaryotic microorganisms and captive wild animals (eg. 
20 deer, foxes, kangaroos). Reference herein to mcg4 and mcgl8 or their respective proteins 
MCG4, MCG7 and MCG18 includes reference to these molecules of human origin as well as 
novel forms of non-human origin. 

The nucleic acid molecules of the present invention may be DNA or RNA, When the nucleic 
25 acid molecule is in DNA form, it may be genomic DNA or cDNA. RNA forms of the nucleic 
acid molecules of the present invention are generally mRNA. 

Although the nucleic acid molecules of the present invention are generally in isolated form, they 
may be integrated into or ligated to or otherwise fused or associated with other genetic 
30 molecules such as vector molecules and in particular expression vector molecules. Vectors and 
expression vectors are generally capable of replication and, if applicable, expression in one or 
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both of a prokaryotic cell or a eukaryotic cell. Preferably, prokaryotic cells include E. coli, 
Bacillus sp and Pseudomonas sp. Preferred eukaryotic cells include yeast, fungal, mammalian 
and insect cells. 

5 Accordingly, another aspect of the present invention contenplates a genetic construct comprising 
a vector portion and an animal, more particularly a mammalian and even more particularly a 
human mcg4 gene portion, which mcg4 gene portion is capable of encoding an MCG4 
polypeptide or a functional or immunologically interactive derivative thereof. 

10 Preferably, the mcg4 gene portion of the genetic construct is operably linked to a promoter in 
the vector such that said promoter is capable of directing expression of said mcg4 gene portion 
in an appropriate cell. 

In addition, the mcg4 gene portion of the genetic construct may comprise all or part of the gene 
15 fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

20 

It is proposed in accordance with the present invention that MCG4 is a transcription factor 
involved in gene regulation. Mutations in mcg4 may result in aberrations in gene regulation 
leading to the development of or a propensity to develop various types of cancer. In this regard, 
although not wishing to limit the present invention to any one hypothesis or mode of action, it 
25 is proposed that mcg4 or its expression product may be involved in the tissue-specific or 
temporal regulation of particular genes. 

A deletion or aberration in the mcg4 gene may also be important in the detection of cancer or 
a propensity to develop cancer. An aberration may be a homozygous mutation or a 
30 heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
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be determined by assaying for aberrations in the parents and/or proband of a subject under 
investigation. 

According to this aspect of the present invention, there is contemplated a method of detecting 
5 a condition caused or facilitated by an aberration in mcg4, said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcg4 wherein the presence of such a nucleotide 
substitution, deletion and/or addition or other aberration may be indicative of said condition or 
a propensity to develop said condition. 

10 

Another aspect of the present invention contemplates a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg7 
gene portion, which mcg7 gene portion is capable of encoding an mcg7 polypeptide or a 
functional or immunologically interactive derivative thereof. 

15 

Preferably, the mcg7 gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said mcg7 gene portion 
in an appropriate cell. 

20 In addition, the mcg7 gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
25 comprising same. 

It is proposed in accordance with the present invention that MCG7 is a GEF involved in signal 
transduction. Mutations in mcg7 or MCG7 may result in defective control of cell proliferation 
leading to the development of or a propensity to develop various types of cancer. 

30 

A deletion or aberration in the mcg7 gene may also be important in the detection of cancer or 
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a propensity to develop cancer. An aberration may be a homozygous mutation or a 
heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
be determined by assaying for aberrations in the parents of a subject under investigation. 

5 

According to this aspect of the present invention, there is contemplated a method of detecting 
a condition caused or facilitated by an aberration in mcg7 t said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said meg 7 wherein the presence of such a nucleotide 
10 substitution, deletion and/or addition or other aberration may be indicative of said condition or 
a propensity to develop said condition. 

Yet another aspect of the present invention contemplates a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human 
15 mcgl8 gene portion, which mcgl8 gene portion is capable of encoding an MCG18 polypeptide 
or a functional or immunologically interactive derivative thereof. 

Preferably, the mcgl8 gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said mcgl8 gene portion 
20 in an appropriate cell. 

In addition, the mcgl8 gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

25 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

It is proposed in accordance with the present invention that MCG18 is a transcription factor 
30 involved in protein folding, protein complex assembly and transit through subcellular 
conq>artments. MCG18 may also have a role in tumour suppression. Thus mutations in meg 1 8 
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may result in the development of or a propensity to develop various types of cancer. 

A deletion or aberration in the meg 18 gene may also be important in the detection of cancer or 
a propensity to develop cancer. An aberration may be a homozygous mutation or a 
5 heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
be determined by assaying for aberrations in the parents and/or proband of the subject under 
investigation. 

10 According to this aspect of the present invention, there is contemplated a method of detecting 
a condition caused or facilitated by an aberration in meg 18, said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcgl8 wherein the presence of such a nucleotide 
substitution, deletion and/or addition or other aberration may be indicative of said condition or 

1 5 a propensity to develop said condition. 

The nucleotide substitutions, additions or deletions may be detected by any convenient means 
including nucleotide sequencing, restriction fragment length polymorphism (RFLP), polymerase 
chain reaction (PCR), oligonucleotide hybridization and single stranded conformation 
20 polymorphism analysis (SSCP) amongst many others. An aberration includes modification to 
existing nucleotides such as to modify glycosylation signal amongst other effects. 

In an alternative method, aberrations in the mcg4, mcg7 and mcgl8 genes are detected by 
screening for mutations in MCG4, MCG7 and MCG18, respectively. 

25 

A mutation in MCG4, MCG7 or MCG18 may be a single or multiple amino acid substitution, 
addition and/or deletion. The mutation in mcg4, mcg7 or mcgl8 may also result in either no 
translation product being produced or a product in truncated form. A mutant may also be an 
altered glycosylation pattern or the introduction of side chain modifications to amino acid 
30 residues. 
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According to this aspect of the present invention, there is provided a method of detecting a 
condition caused or facilitated by an aberration in mcg4, mcg7 or meg 18 said method comprising 
screening for a single or multiple amino acid substitution, deletion and/or addition to MCG4, 
MCG7 or MCG18 wherein the presence of such a mutation is indicative of or a propensity to 
5 develop said condition. 

A particularly convenient means of detecting a mutation in MCG4, MCG7 or MCG18 is by use 
of antibodies. 

10 Accordingly another aspect of the present invention is directed to antibodies to MCG4, MCG7 
or MCG18 and its derivatives. Such antibodies may be monoclonal or polyclonal and may be 
selected from naturally occurring antibodies to MCG4, MCG7 or MCG18 or may be specifically 
raised to MCG4, MCG7 or MCG18 or derivatives thereof. In the case of the latter, MCG4, 
MCG7 or MCG18 or their derivatives may first need to be associated with a carrier molecule. 

15 The antibodies to MCG4, MCG7 or MCG18 of the present invention are particularly useful as 
diagnostic agents. 

For example, antibodies to MCG4, MCG7 or MCG18 and their derivatives can be used to screen 
for wild-type MCG4, MCG7 or MCG18 or for mutated MCG4, MCG7 or MCG18 molecules. 

20 The latter may occur, for example, during or prior to certain cancer development. A differential 
binding assay is also particularly useful. Techniques for such assays are well known in the art 
and include, for example, sandwich assays and ELIS A. Knowledge of normal MCG4, MCG7 
or MCG18 levels or the presence of wild-type MCG4, MCG7 or MCG18 may be important for 
diagnosis of certain cancers or a predisposition for development of cancers or for monitoring 

25 certain therapeutic protocols. 

As stated above antibodies to MCG4, MCG7 or MCG18 of the present invention may be 
monoclonal or polyclonal or may be fragments of antibodies such as Fab fragments. 
Furthermore, the present invention extends to recombinant and synthetic antibodies and to 
30 antibody hybrids. A "synthetic antibody" is considered herein to include fragments and hybrids 
of antibodies. 
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For example, specific antibodies can be used to screen for wild-type MCG4, MCG7 or MCG18 
molecule or specific mutant molecules such as molecules having a certain deletion. This would 
be important, for example, as a means for screening for levels of MCG4, MCG7 or MCG18 in 
a cell extract or other biological fluid or purifying MCG4, MCG7 or MCG18 made by 
5 recombinant rreans from culture supernatant fluid or purified from a cell extract. Techniques for 
the assays contemplated herein are known in the art and include, for example, sandwich assays 
and ELISA. 

It is within the scope of this invention to include any second antibodies (monoclonal, polyclonal 
10 or fragments of antibodies or synthetic antibodies) directed to the first mentioned antibodies 
discussed above. Both the first and second antibodies may be used in detection assays or a first 
antibody may be used with a commercially available anti-immunoglobulin antibody. An antibody 
as contemplated herein includes any antibody specific to any region of wild-type MCG4, MCG7 
or MCG18 or to a specific mutant phenotype or to a deleted or otherwise altered region. 

15 

Both polyclonal and monoclonal antibodies are obtainable by immunization of a suitable animal 
or bird with MCG4, MCG7 or MCG18 or its derivatives and either type is utilizable for 
immunoassays. The methods of obtaining both types of sera are well known in the art. 
Polyclonal sera are less preferred but are relatively easily prepared by injection of a suitable 
20 laboratory animal or bird with an effective amount of MCG4, MCG7 or MCG18 or antigenic 
parts thereof or derivatives thereof, collecting serum from the animal or bird, and isolating 
specific sera by any of the known immunoadsorbent techniques. Although antibodies produced 
by this method are utilizable in virtually any type of immunoassay, they are generally less 
favoured because of the potential heterogeneity of the product. 

25 

The use of monoclonal antibodies in an immunoassay is particularly preferred because of the 
ability to produce them in large quantities and the homogeneity of the product. The preparation 
of hybridoma cell lines for monoclonal antibody production derived by fusing an immortal cell 
line and lymphocytes sensitized against the immunogenic preparation can be done by techniques 
30 which are well known to those who are skilled in the art. 
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Another aspect of the present invention contemplates a method for detecting MCG4, MCG7 or 
MCG18 or a derivative thereof in a biological sample said method comprising contacting said 
biological sample with an antibody specific for MCG4, MCG7 or MCG18 or its derivatives or 
homologues for a time and under conditions sufficient for an antibody-MCG4, MCG7 or 
5 MCG18 complex to form, and then detecting said complex. 

Preferably, the biological sample is a cell extract from a human or other animal or a bird. 

The presence of MCG4, MCG7 or MCG18 may be accomplished in a number of ways such as 
10 by Western blotting and ELISA procedures. A wide range of immunoassay techniques are 
. available as can be seen by reference to US Patent Nos. 4,016,043, 4, 424,279 and 4,018,653. 

These include both single-site and two-site or "sandwich" assays of the non-competitive types, 

as well as traditional competitive binding assays. These assays also include direct binding of a 

labelled antibody to a target. 

15 

Sandwich assays are among the most useful and commonly used assays and are favoured for use 
in the present invention. A number of variations of the sandwich assay technique exist, and all 
are intended to be encompassed by the present invention. Briefly, in a typical forward assay, an 
unlabelled antibody is immobilized on a solid substrate and the sample to be tested brought into 

20 contact with the bound molecule. After a suitable period of incubation, for a period of time 
sufficient to allow formation of an antibody-antigen complex, a second antibody specific to the 
antigen, labelled with' a reporter molecule capable of producing a detectable signal is then added 
and incubated, allowing time sufficient for the formation of another complex of antibody-antigen- 
labelled antibody. Any unreacted material is washed away, and the presence of the antigen is 

25 determined by observation of a signal produced by the reporter molecule. The results may either 
be qualitative, by simple observation of the visible signal, or may be quantitated by comparing 
with a control sample containing known amounts of hapten. Variations on the forward assay 
include a simultaneous assay, in which both sample and labelled antibody are added 
simultaneously to the bound antibody. These techniques are well known to those skilled in the 

30 art, including any minor variations as will be readily apparent. In accordance with the present 
invention the sample is one which might contain MCG4, MCG7 or MCG18 including cell extract 
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or tissue biopsy. The sample is, therefore, generally a biological sample comprising biological 
fluid but also extends to fermentation fluid and supernatant fluid such as from a cell culture. 

In the typical forward sandwich assay, a first antibody having specificity for the MCG4, MCG7 
5 or MCG18 or an antigenic part thereof or a derivative thereof or antigenic parts thereof, is either 
covalently or passively bound to a solid surface. The solid surface is typically glass or a polymer, 
the most commonly used polymers being cellulose, polyacrylamide, nylon, polystyrene, polyvinyl 
chloride or polypropylene. The solid supports may be in the form of tubes, beads, discs of 
microplates, or any other surface suitable for conducting an immunoassay. The binding 
10 processes are well-known in the art and generally consist of cross-linking covalently binding or 
physically adsorbing, the polymer-antibody complex is washed in preparation for the test sample. 
An aliquot of the sample to be tested is then added to the solid phase complex and incubated for 
a period of time sufficient (e.g. 2-40 minutes or overnight if more convenient) and under suitable 
conditions (e.g. from room temperature to 37 °C) to allow binding of any subunit present in the 
15 antibody. Following the incubation period, the antibody subunit solid phase is washed and dried 
and incubated with a second antibody specific for a portion of the hapten. The second antibody 
is linked to a reporter molecule which is used to indicate the binding of the second antibody to 
the hapten. 

20 An alternative method involves immobilizing the target molecules in the biological sample and 
then exposing the immobilized target to specific antibody which may or may not be labelled with 
a reporter molecule. Depending on the amount of target and the strength of the reporter 
molecule signal, a bound target may be detectable by direct labelling with the antibody. 
Alternatively, a second labelled antibody, specific to the first antibody is exposed to the target- 

25 first antibody complex to form a target-first antibody-second antibody tertiary complex. The 
complex is detected by the signal emitted by the reporter molecule. 

By "reporter molecule" as used in the present specification, is meant a molecule which, by its 
chemical nature, provides an analytically identifiable signal which allows the detection of antigen- 
30 bound antibody. Detection may be either qualitative or quantitative. The most commonly used 
reporter molecules in this type of assay are either enzymes, fluorophores or radionuclide 
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containing molecules (i.e. radioisotopes) and chemiluminescent molecules. 
In the case of an enzyme immunoassay, an enzyme is conjugated to the second antibody, 
generally by means of glutaraldehyde or periodate. As will be readily recognized, however, a 
wide variety of different conjugation techniques exist, which are readily available to the skilled 
5 artisan. Commonly used enzymes include horseradish peroxidase, glucose oxidase, beta- 
galactosidase and alkaline phosphatase, amongst others. The substrates to be used with the 
specific enzymes are generally chosen for the production, upon hydrolysis by the corresponding 
enzyme, of a detectable colour change. Examples of suitable enzymes include alkaline 
phosphatase and peroxidase. It is also possible to employ fluorogenic substrates, which yield a 

10 fluorescent product rather than the chromogenic substrates noted above. In all cases, the 
enzyme-labelled antibody is added to the first antibody hapten complex, allowed to bind, and 
then the excess reagent is washed away. A solution containing the appropriate substrate is then 
added to the complex of antibody-antigen-antibody. The substrate will react with the enzyme 
linked to the second antibody, giving a qualitative visual signal, which may be further quantitated, 

15 usually spectrophotometrically, to give an indication of the amount of hapten which was present 
in the sample. "Reporter molecule" also extends to use of cell agglutination or inhibition of 
agglutination such as red blood cells on latex beads, and the like. 

Alternately, fluorescent compounds, such as fluorescein and rhodamine, may be chemically 
20 coupled to antibodies without altering their binding capacity. When activated by illumination 
with light of a particular wavelength, the fluorochrome-labelled antibody adsorbs the light 
energy, inducing a state to excitability in the molecule, followed by emission of the light at a 
characteristic colour visually detectable with a light microscope. As in the EIA, the fluorescent 
labelled antibody is allowed to bind to the first antibody-hapten complex. After washing off the 
25 unbound reagent, the remaining tertiary complex is then exposed to the light of the appropriate 
wavelength the fluorescence observed indicates the presence of the hapten of interest. 
Immunofluorescence and EIA techniques are both very well established in the art and are 
particularly preferred for the present method. However, other reporter molecules, such as 
radioisotope, chemiluminescent or bioluminescent molecules, may also be employed. 

30 

As stated above, the present invention extends to genetic constructs capable of encoding MCG4, 
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MCG7 or MCG18 or functional derivatives thereof. Such genetic constructs are also 
contemplated to be useful in modulating expression of specific genes in which mcg4, mcg7 or 
mcgl8 is involved in tissue-specific or temporal regulation. 

5 Accordingly, another aspect of the present invention is directed to a genetic construct comprising 
a nucleotide sequence encoding a peptide, polypeptide or protein and mcg4, mcg7 or mcgl8 or 
a functional derivative or homologue thereof capable of modulating the expression of said 
nucleotide sequence. 

10 As stated above, MCG18 is proposed to have a role in tumour suppression. Accordingly, it is 
further proposed in accordance with the present invention to use recombinant MCG18 in 
pharmaceutical preparations for treating arresting or otherwise ameliorating the effects of certain 
cancers. 

15 Accordingly, another aspect of the present invention contemplates a method for treating, 
arresting or otherwise ameliorating the effects of a cancer in an animal or bird, said method 
comprising administering to said animal or bird an effective amount of MCG18 or a functional 
derivative thereof for a time and under conditions sufficient to treat, arrest or otherwise 
ameliorate the effects of said cancer. 

20 

The present invention, therefore, contemplates a pharmaceutical composition comprising 
MCG18 or a derivative thereof or a modulator of mcgl8 expression or MCG18 activity and one 
or more pharmaceutically acceptable carriers and/or diluents. These components are referred 
to hereinafter as the "active ingredients". The active ingredients may also include anti-cancer 
25 agents or agents which facilitate actions of MCG18. 

The pharmaceutical forms suitable for injectable use include sterile aqueous solutions (where 
water soluble) and sterile powders for the extemporaneous preparation of sterile injectable 
solutions. It must be stable under the conditions of manufacture and storage and must be 
30 preserved against the contaminating action of microorganisms such as bacteria and fungi. The 
carrier may be a solvent medium containing, for example, water, ethanol, polyol (for example, 
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glycerol, propylene glycol and liquid polyethylene glycol and the like), suitable mixtures thereof, 
and vegetable oils. The proper fluidity can be maintained, for example, by the use of a coating 
such as licithin and by the use of superfactants. The preventions of the action of microorganisms 
can be brought about by various antibacterial and antifungal agents, for example, parabens, 
5 chlorobutanol, phenol, sorbic acid, thimersal and the like. In many cases, it will be preferable to 
include isotonic agents, for example, sugars or sodium chloride. Prolonged absorption of the 
injectable compositions can be brought about by the use in the compositions of agents delaying 
absorption, for example, aluminum monostearate and gelatin. 

10 Sterile injectable solutions are prepared by incorporating the active compounds in the required 
amount in the appropriate solvent with various of the other ingredients enumerated above, as 
required, followed by filtered sterilization. In the case of sterile powders for the preparation of 
sterile injectable solutions, the preferred methods of preparation are vacuum drying and the 
freeze-drying technique which yield a powder of the active ingredient plus any additional desired 

15 ingredient from previously sterile-filtered solution thereof. 

When the active ingredients are suitably protected they may be orally administered, for example, 
with an inert diluent or with an assimilable edible carrier, or it may be enclosed in hard or soft 
shell gelatin capsule, or it may be compressed into tablets, or it may be incorporated directly with 

20 the food of the diet. For oral therapeutic administration, the active compound may be 
incorporated with excipients and used in the form of ingestible tablets, buccal tablets, troches, 
capsules, elixirs, suspensions, syrups, wafers, and the like. Such compositions and preparations 
should contain at least 1% by weight of active compound. The percentage of the compositions 
and preparations may, of course, be varied and may conveniently be between about 5 to about 

25 80% of the weight of the unit. The amount of active compound in such therapeutically useful 
compositions in such that a suitable dosage will be obtained. Preferred compositions or 
preparations according to the present invention are prepared so that an oral dosage unit form 
contains between about 0. 1 //g and 2000 mg of active compound: 

30 The tablets, troches, pills, capsules and the like may also contain the components as listed 
hereafter. A binder such as gum, acacia, com starch or gelatin; excipients such as dicalcium 
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phosphate; a disintegrating agent such as corn starch, potato starch, alginic acid and the like; 
a lubricant such as magnesium stearate; and a sweetening agent such a sucrose, lactose or 
saccharin may be added or a flavouring agent such as peppermint, oil of wintergreen, or cherry 
flavouring. When the dosage unit form is a capsule, it may contain, in addition to materials of 

5 the above type, a liquid carrier. Various other materials may be present as coatings or to 
otherwise modify the physical form of the dosage unit. For instance, tablets, pills, or capsules 
may be coated with shellac, sugar or both. A syrup or elixir may contain the active compound, 
sucrose as a sweetening agent, methyl and propylparabens as preservatives, a dye and flavouring 
such as cherry or orange flavour. Of course, any material used in preparing any dosage unit form 

10 should be pharmaceutically pure and substantially non-toxic in the amounts employed. In 
addition, the active compound(s) may be incorporated into sustained-release preparations and 
formulations. 

The present invention also extends to forms suitable for topical application such as creams, 
15 lotions and gels. 

Pharmaceutically acceptable carriers and/or diluents include any and all solvents, dispersion 
iredia, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents and 
the like. The use of such media and agents for pharmaceutical active substances is well known 
20 in the art. Except insofar as any conventional media or agent is incompatible with the active 
ingredient, use thereof in the therapeutic compositions is contemplated. Supplementary active 
ingredients can also be incorporated into the compositions. 

It is especially advantageous to formulate parenteral compositions in dosage unit form for ease 
25 of administration and uniformity of dosage. Dosage unit form as used herein refers to physically 
discrete units suited as unitary dosages for the mammalian subjects to be treated; each unit 
containing a predetermined quantity of active material calculated to produce the desired 
therapeutic effect in association with the required pharmaceutical carrier. The specification for 
the novel dosage unit forms of the invention are dictated by and directly dependent on (a) the 
30 unique characteristics of the active material and the particular therapeutic effect to be achieved, 
and (b) the limitations inherent in the art of compounding such an active material for the 
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treatment of disease in living subjects having a diseased condition in which bodily health is 
impaired as herein disclosed in detail. 

The principal active ingredient is compounded for convenient and effective administration in 
5 effective amounts with a suitable pharmaceutically acceptable carrier in dosage unit form as 
hereinbefore disclosed. A unit dosage form can, for example, contain the principal active 
compound in amounts ranging from 0.5 pg to about 2000 mg. Expressed in proportions, the 
active compound is generally present in from about 0.5 ng to about 2000 mg/ml of carrier. In 
the case of compositions containing supplementary active ingredients, the dosages are 
10 determined by reference to the usual dose and manner of administration of the said ingredients. 

Effective amounts contemplated by the present invention include those amounts effective to 
ameliorate a condition. For example, it is envisaged that effective amounts would range from 
about 0.001 Mg/kg body weight to about 100 mg/kg body weight. Alternatively, effective 
15 amounts of about 0.01 Mg/kg body weight to about 10 mg/kg body weight or even 0. 1 Mg/kg 
body weight to about 1 mg/kg body weight. Administration may be per minute, hour, day, week, 
month or year or may only be a once off administration. 

The pharmaceutical composition may also comprise genetic molecules such as a vector capable 
20 of transfecting target cells where the vector carries a nucleic acid molecule capable of modulating 
meg 18 expression or MCG1 8 activity. The vector may, for example, be a viral vector. 

As stated above, the present invention further contemplates a range of derivatives of MCG18. 

Derivatives include fragments, parts, portions, mutants, homologues and analogues of the 
25 MCG18 polypeptide and corresponding genetic sequence. Derivatives also include single or 

multiple amino acid substitutions, deletions and/or additions to MCG18 or single or multiple 

nucleotide substitutions, deletions and/or additions to the genetic sequence encoding MCG18. 

"Additions" to amino acid sequences or nucleotide sequences include fusions with other 

peptides, polypeptides or proteins or fusions to nucleotide sequences. Reference herein to 
30 "MCG18" includes reference to all derivatives thereof including functional derivatives or MCG18 

immunologically interactive derivatives. 
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Analogues of MCG18 contemplated herein include, but are not limited to, modification to side 
chains, incorporating of unnatural amino acids and/or their derivatives during peptide, 
polypeptide or protein synthesis and the use of crosslinkers and other methods which impose 
conformational constraints on the proteinaceous molecule or their analogues. 

5 

Examples of side chain modifications contemplated by the present invention include 
modifications of amino groups such as by reductive alkylation by reaction with an aldehyde 
followed by reduction with NaBIfy; amidination with methylacetimidate; acylation with acetic 
anhydride; carbamoylation of amino groups with cyanate; trinitrobenzylation of amino groups 
10 with 2, 4, 6-trinitrobenzene sulphonic acid (TNBS); acylation of amino groups with succinic 
anhydride and tetrahydrophthalic anhydride; and pyridoxylation of lysine with pyridoxal-5- 
phosphate followed by reduction with NaBH^ 

The guanidine group of arginine residues may be modified by the formation of heterocyclic 
15 condensation products with reagents such as 2,3-butanedione, phenylglyoxal and glyoxaL 

The carboxyl group may be modified by carbodiimide activation via Oacylisourea formation 
followed by subsequent derivitisation, for example, to a corresponding amide. 

20 Sulphydryl groups may be modified by methods such as carboxymethylation with iodoacetic acid 
or iodoacetamide; performic acid oxidation to cysteic acid; formation of a mixed disulphides 
with other thiol compounds; reaction with maleimide, maleic anhydride or other substituted 
maleimide; formation of mercurial derivatives using 4-chloromercuribenzoate, 4- 
chloromercuriphenylsulphonic acid, phenylmercury chloride, 2-chloromercuri-4-nitrophenol and 

25 other mercurials; carbamoylation with cyanate at alkaline pH. 

Tryptophan residues may be modified by, for example, oxidation with N-bromosuccinimide or 
alkylation of the indole ring with 2-hydroxy-5-nitrobenzyl bromide cc sulphenyl halides. 
Tyrosine residues on the other hand, may be altered by nitration with tetranitromethane to form 
30 a 3-nitrotyrosine derivative. 
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Modification of the imidazole ring of a histidine residue may be accomplished by alkylation with 
iodoacetic acid derivatives or N-carbethoxylation with diethylpyrocarbonate. 

Examples of incorporating unnatural amino acids and derivatives during peptide synthesis 
5 include, but are not limited to, use of norleucine, 4-amino butyric acid, 4-amino-3-hydroxy-5- 
phenylpentanoic acid, 6-aminohexanoic acid, t-butylglycine, norvaline, phenylglycine, ornithine, 
sarcosine, 4-amino-3-hydroxy-6-methylheptanoic acid, 2-thienyl alanine and/or D-isomers of 
amino acids. A list of unnatural amino acids, contemplated herein is shown in Table 3. 
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TABLE 3 



Non-conventional 
amino arid 

5 


Code 


Non-conventional 
amino acid 


Code 


a-aminobutvric acid 


Abu 


L-N-methylalanine 


Nmala 


a-amino-a-methvlbutvrate 


Mgabu 


L-N-methylarginine 


Nmarg 




Cpro 


L-N-methylasparagine 


Nmasn 


carhoxvlate 




L-N-methylaspartic acid 


Nmasp 


10 ami nnisinHiitvric acid 


Aib 


L-N-methylcysteine 


Nmcys 


aminonornornvl- 


Norb 


L-N-methylglutamine 


Nmgln 


parhowlatp 
cot uuAviaiw 




L-N-methylglutamic acid 


Nmglu 




Chexa 


L-N-methylhistidine 


Nmhis 


/"» * r /* 1 f\T\&n t\j\di oninA 

cyciopcniyiaidiuiic 


Cnen 


L-N-methvlisolleucine 


Nmile 


1 ^ ri-alaninp 


Dal 


L-N-methylleucine 


Nmleu 


LJ- dlgll 11 I1C 


Darp 


L-N-methyllysine 


Nmlys 


D-ajcnartic acid 


Dasp 


L-N-methylmethionine 


Nmmet 




Dcys 


L-N-methylnorleucine 


Nmnle 


D- cxl i it am i nf* 

IS t^lUltUlJUllW 


Dgln 


L-N-methylnorvaline 


Nmnva 


20 D- glutamic acid 

\J Ms glUVIUlUV ******* 


Dglu 


L-N-methylornithine 


Nmorn 


D-histidine 


Dhis 


L-N-methylphenylalanine 


Nmphe 


TY-i <;o1piici ne 


Dile 


L-N-methylproline 


Nmpro 




Dleu 


L-N-methylserine 


Nmser 


Ly-iysine 


Dlvs 


L-N - meth vl threonine 

MS a ™ lllvUJJ lUUWliUIV 


Nmthr 


25 D-methionine 


Dmet 


L-N-methyltryptophan 


Nmtrp 


D-omithine 


Dora 


L-N-methyltyrosine 


Nmtyr 


D-phenylalanine 


Dphe 


L-N-methylvaline 


Nmval 


D-proline 


Dpro 


L-N-methylethylglycine 


Nmetg 


D-serine 


Dser 


L-N>methyl-t-butylglycine 


Nrrtbug 


30 D-threonine 


Dthr 


L-norleucine 


Nle 


D-tryptophan 


Dtrp 


L-norvaline 


Nva 
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D-tyrosine 


Dtyr 


a-methyl-aminoisobutyrate 


Maib 


D-valine 


Dval 


a-methyl- y -aminobu ty rate 


Mgabu 


D-a-methylalanine 


Dmala 


a-methylcyclohexylalanine 


Mchexa 


D- a-methy larginine 


Dmarg 


a-methylcylcopentylalanine 


Mcpen 


5 D-a-methylasparagine 


Dmasn 


a-methyl-ct-napthylalanine 


Manap 


. D-a-methylaspartate 


Dmasp 


a-methylpenicillamine 


Mpen 


D-a-methylcysteine 


Dmcys 


N-(4-aminobutyl)glycine 


Nglu 


D-a-methylglutamine 


Dmgln 


N-(2-aminoethyl)glycine 


Naeg 


D-a-methylhistidine 


Dmhis 


N-(3-aminopropyl)glycine 


Norn 


10 D-a-methylisoleucine 


Dmile 


N-amino-a-methylbutyrate 


Nmaabu 


D-a-methylleucine 


Dmleu 


a-napthylalanine 


Anap 


D-a-methyllysine 


Dmlys 


N-benzylglycine 


Nphe 


D-a-methylmethionine 


Dmmet 


N-(2-carbamylethyl)glycine 


Ngln 


D-a-methylornithine 


Dmorn 


N-(carbamylmethyl)glycine 


Nasn 


15 D-a-methylphenylalanine 


Dmphe 


N-(2-carboxyethyl)glycine 


Nglu 


D-a-methylproline 


Dmpro 


N-(carboxymethyl)glycine 


Nasp 


D-a-methylserine 


Dmser 


N-cyclobutylglycine 


Ncbut 


D-a-methylthreonine 


Dmthr 


N-cycloheptylglycine 


Nchep 


D-a-methyltryptophan 


Dmtrp 


N-cyclohexylglycine 


Nchex 


20 D-OE-methyltyrosine 


Dmty 


N-cyclodecylglycine 


Ncdec 


D-a-methylvaline 


Dmval 


N-cylccxlodecylglycine 


Ncdod 


D-N-methylalanine 


Dnmala 


N-cyciooctylglycine 


Ncoct 


D-N-methylarginine 


Dnmarg 


N-cyclopropylglycine 


Ncpro 


D-N-methylasparagine 


Dnmasn 


N-cycloundecylglycine 


Ncund 


25 D-N-methylaspartate 


Dnmasp 


N-(2,2-diphenylethyl)glycine 


Nbhm 


D-N-methylcysteine 


Dnmcys 


N-(3 ,3-dipheny lpropyl)gly cine 


Nbhe 


D-N-methylglutamine 


Dnmgln 


N-(3-guanidinopropyl)glycine 


Narg 


D-N-methylglutamate 


Dnmglu 


N-( 1 -hydroxyethyl)glycine 


Nthr 


D-N-methylhistidine 


Dnmhis 


N-(hydroxyethyl))glycine 


Nser 


30 D-N-methylisoleucine 


Dnmile 


N-(imidazolylethyl))glycine 


Nhis 


D-N-methylleucine 


Dnmleu 


N-(3-indolylyethyl)glycine 


Nhtrp 
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D-N-methyllysine 


Dnmlys 


N-methy 1- y -aminobu tyrate 


Nngabu 


N-methylcyclohexylalanine 


Nmchexa 


D-N-methylmethionine 


Dnrrmet 


D-N-methylornithine 


Dnmorn 


N-methylcyclopentylalanine 


Nnrpen 


N-methylglycine 


Nala 


D-N-methylphenylalaninc 


Dnmphe 


5 N-methylaminoisobutyrate 


Nmaib 


D-N-methylproline 


Dnmpro 


N-( 1 -methylpropyl)glycine 


Nile 


D-N-methylserine 


Dnmser 


N-(2-methylpropyl)glycine 


Nleu 


D-N-methylthreonine 


Dnmthr 


D-N-methyltryptophan 


Dnmtrp 


N-( 1 -methylethyl)glycine 


Nval 


D-N-methyltyrosine 


Dnmtyr 


N-methyla-napthylalanine 


Nmanap 


10 D-N-methylvaline 


Dnmval 


N-methylpenicillamine 


Nmpen 


y-aminobutyric acid 


Gabu 


N-(/?-hydroxyphenyl)glycine 


Nhtyr 


L-r-butylglycine 


Tbug 


N-(thiomethyl)glycine 


Ncys 


L-ethylglycine 


Etg 


penicillamine 


Pen 


L-homophenylalanine 


Hphe 


L-a-methylaianine 


Mala 


15 L-a-methylarginine 


Marg 


L-a-methylasparagine 


Masn 


L-a-methylaspartate 


Masp 


L-a-methyl-f-butylglycine 


Mtbug 


L-a-methylcysteine 


Mcys 


L-methylethylglycine 


Metg 


L-a-methylglutamine 


Mgln 


L-a-methylglutamate 


Mglu 


L-a-methylhistidine 


Mhis 


L-a-methylhomophenylalanine 


Mhphe 


20 L-a-methylisoleucine 


Mile 


N-(2-methylthioethyl)glycine 


Nmet 


L-a-methylleucine 


Mleu 


L-a-methyllysine 


Mlys 


L- a-methy lmethionine 


Mmet 


L-a-methylnorleucine 


Mnle 


L- a-methy lnorvaline 


Mnva 


L- a-methy lornithine 


Morn 


L-a-methylphenylalanine 


Mphe 


L-a-methylproline 


Mpro 


25 L-a-methylserine 


Mser 


L-a-methylthreonine 


Mthr 


L-a-methyltryptophan 


Mtrp 


L-a-methyltyrosine 


Mtyr 
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L-a-methylvaline Mval 
N-(N-(2,2-diphenylethyl) Nnbhm 
caibamylmethyl)glycine 
1-carboxy- 1 -(2,2-diphenyl- Nmbc 
5 ethylamino)cyclopropane 



L-N-methylhomophenylalanine Nmhpte 
N-(N-(3,3-diphenylpropyl) Nnbhe 
caibamylmethyl)glycine 



Crosslinkers can be used, for example, to stabilise 3D conformations, using homo-bifunctional 
crosslinkers such as the Afunctional imido esters having (CH2> n spacer groups with n= 1 to n=6, 

10 glutaraldehyde, N-hydroxysuccinimide esters and hetero-bifunctional reagents which usually 
contain an amino-reactive moiety such as N-hydroxysuccinimide and another group specific- 
reactive moiety such as maleimido or dithio moiety (SH) or carbodiimide (COOH). In addition, 
peptides can be conformationally constrained by, for example, incorporation of C a and 1^ - 
methylamino acids, introduction of double bonds between C a and C p atoms of amino acids and 

15 the formation of cyclic peptides or analogues by introducing covalent bonds such as forming an 
amide bond between the N and C termini, between two side chains or between a side chain and 
the N or C terminus. 

Such analogues also apply in respect of MCG4 and MCG7. 

20 

The present invention further contemplates chemical analogues of MCG18 capable of acting as 
antagonists or agonists of MCG 18 or which can act as functional analogues of MCG18. 
Chemical analogues may not necessarily be derived from MCG18 but may share certain 
conformational similarities. Alternatively, chemical analogues may be specifically designed to 
25 mimic certain physiochemical properties of MCG 18. Chemical analogues may be chemically 
synthesised or may be detected following, for example, natural product screening. 

The identification of MCG '.8 permits the generation of a range of therapeutic molecules capable 
of modulating expression of MCG 18 or modulating the activity of MCG 18. Modulators 
30 contemplated by the present invention includes agonists and antagonists of MCG 18 expression. 
Antagonists of MCG 18 expression include antisense molecules, ribozymes and co-suppression 
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molecules. Agonists include molecules which increase promoter ability or interfere with negative 
regulatory mechanisms. Agonists of MCG1 8 include molecules which overcome any negative 
regulatory mechanism. Antagonists of MCG18 include antibodies and inhibitor peptide 
fragments. 

5 

These types of modifications may be important to stabilise MCG18 if administered to an 
individual or for use as a diagnostic reagent. 

Other derivatives contemplated by the present invention include a range of glycosylation variants 
10 from a completely unglycosylated molecule to a modified glycosylated molecule. Altered 
glycosylation patterns may result from expression of recombinant molecules in different host 
cells. 

Another embodiment of the present invention contemplates a method for modulating expression 
15 of MCG18 in a human, said method comprising contacting the mcgl8 gene encoding MCG18 
with an effective amount of a modulator of mcgl8 expression for a time and under conditions 
sufficient to up-regulate or down-regulate or otherwise modulate expression of mcgl8. For 
example, a nucleic acid molecule encoding MCG18 or a derivative thereof may be introduced 
into a cell to facilitate protection of that cell from becoming cancerous. 

20 

Another aspect of the present invention contemplates a method of modulating activity of MCG1 8 
in a human, said method comprising administering to said mammal a modulating effective amount 
of a molecule for a time and under conditions sufficient to increase or decrease MCG18 activity. 
The molecule may be a proteinaceous molecule or a chemical entity and may also be a derivative 
25 of MCG18 or a chemical analogue or truncation mutant of MCG18. 



The present invention is further described with reference to the following non-limiting Examples. 
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EXAMPLE 1 

A human gene (designated mcg4) was identified on chromosome 1 lq 1 3 that on the basis of 
sequence homology is predicted to encode a putative transcription factor of 310 amino acids 
5 (Fig. 1). mcg4 is transcribed in several different cell lines (Fig. 7). 

EXAMPLE 2 

The expressed sequence tag (EST) database contains partial sequence data for the murine (Fig. 
10 2) and nematode (Fig. 3) homologues of mcg4. 

EXAMPLE 3 

MCG4 contains a sequence of cysteine residues within the N-terminal region of the protein that 
15 resembles zinc-finger binding domains of a novel type, ie. (HC 3 ) 2 [Fig. 4]. 

EXAMPLE 4 

Sensitive sequence homology searches reveal that related cysteine-containing motifs are present 
20 in another C. elegans protein (Fig. 5) as well as the GATA-binding transcription factor from 5. 
pombe (Fig. 6). 

EXAMPLES 

25 mcg4 will have commercial value due to its likelihood of encoding a novel transcription factor 
that is highly conserved amongst organisms, thus suggesting an integral role in gene regulation. 
mcg4 may also be involved in some way in tissue-specific or temporal regulation of certain genes, 
thus making it a potential target for modulating expression of those downstream effectors. 



30 
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EXAMPLE 6 

Nucleotide sequence data generated from cosmid clone cSRL-72c4 with the T7 primer 
(Promega, and Applied Biosystems Incorporated dye terminator sequencing kit) was aligned to 

5 the GenBank Expressed Sequence Tag (EST) database using the program BLASTN ( Altschul 
et al 1990) and was found to match numerous human and mouse entries (Table 4 and Figure 2). 
These matching ESTs were further used to identify overlapping entries in the EST database 
(Table 5). The nucleotide sequences of these human ESTs were complied using Mac Vector 
4.2.1 software (IBI-Kodak) to produce the cDNA sequence shown in Figure 1. EST entries 

10 AA074703 and AA1 34788 are closely related at the nucleotide level to mcg4 and it is, therefore, 
likely that mcg4 is a member of a newly discovered gene family (Figure 8). 

The cDNA sequence of mcg4 was translated in all possible reading frames and compared to the 
GenBank non-redundant protein database using the program BLASTX (Altschul et al 1990) at 

15 the National Center for Biotechnology Information (http//www.ncbi.nih.gov.nlm). As the 
protein appeared to be novel, a translation of the longest reading frame for the mcg4 cDN A was 
aligned to the EST database using the program TBLASTN, which performed a dynamic 
translation of the EST database in all 6 frames. The search results indicated that the nematode 
C elegans had an MCG4-like protein (Figure 3), with the matching domains containing a spatial 

20 sequence of Cysteine and Histidine residues which resembled a zinc-finger structure (Figure 4). 
The program BLASTP was used, therefore, to conduct sensitive searches of the protein 
databases for similar zinc-finger motifs. A weak match to the putative zinc-finger domain was 
observed for another protein from C. elegans (Figure 5) and a poorer match for the GATA- 
binding transcription factor from 5. pombe (Figure 6). The putative initiation codon of human 

25 mcg4 is not preceded by an in-frame stop codon and it is therefore possible that the cDNA 
described in Figure 1 is a truncated form. However, sequence alignment of human and mouse 
mcg4 ESTs showed a lower degree of nucleotide conservation prior to the assigned initiation 
codon, thus supporting the notion that the region represents the 5' UTR (Figure 9). To 
determine the expression pattern of mcg4, 15/zg of the total cellular RNA (RNeasy Mini Kit, 

30 Qiagen) from various human cell lines grown in culture were electrophoresed through 1.2% w/v 
MOPS/formaldehyde gels and blotted onto nylon membranes (Amersham) by capillary transfer 
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using 20 x SSC (Sambrook et al t 1989). Filters were subsequently UV-fixed and hybridised 
overnight at 65°C to a radiolabeled ( 32 P-dCTP) cDNA probe (Church and Gilbert, 1984) for 
mcg4. After washes in 0. 1 x SSC/0. 1 % w/v SDS at 65°C for 1 hour, the filters were air-dried 
and exposed to X-ray film. This Northern analysis showed that mcg4 is expressed as a 1.6kb 
5 message in numerous tissues including breast, ovary, bladder, lung and keratinocytes (Figure 7). 

EXAMPLE 7 

A human gene (designated mcg7) was identified and isolated from chromosome 1 lql3 which 
10 encodes a protein that bears striking homology with guanine nucleotide exchange factors (GEFs) 
from a wide variety of organisms (Fig. 12). 

EXAMPLE 8 

15 The composite meg 7 cDN A sequence is at least 2.4kb in length and Figure 13(a) shows a 
predicted translation product of at least 609 amino acids beginning at methionine 120. An 
alternative start site due to alternate exon splicing (indicated in lower case) may yield a protein 
of 671 amino acids starting at methionine 58 (Fig. 13a). 

20 EXAMPLE 9 

An meg 7 homologue from C. elegans has been identified, the product of which is highly 
conserved with that of MCG7 (Fig. 14). There are several salient features of the protein which 
have been underlined in Fig. 14 - namely: a guanine nucleotide binding region, a diacylglycerol 
25 binding region, and "EF-hand M -calcium binding regions. In addition, there are several potential 
cAMP, protein kinase C, and casein kinase II phosphorylation sites, as well as a number of 
potential sites for glycosylation (not indicated). 



EXAMPLE 10 

30 

A number of partial human and murine EST clones exist for mcgl. The GenBank database 
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contains a cDNA (Acc. no. Y12336) encoding a full-length open reading frame (ORF) for human 
mcg7 as well as a partial murine mcg7 ORF (Y12339). In addition, the complete genomic 
sequence of the human mcg7 gene is contained within GenBank entry AC000134. 

5 EXAMPLE 11 

The best characterised GEFs arc members of the family of ras oncoproteins, which play a pivotal 
role in signal transduction and when mutated are responsible for tumour development. A variety 
of therapeutic regirres for cancer treatment have been designed to specifically interfere with the 
10 ras signalling pathways. There is potential, therefore that the product of mcg7 could also be a 
target for such clinical strategies. 

EXAMPLE 12 

15 The nucleotide sequence for mcg7 cDNA was extended 5* with genomic DNA sequence from 
Genbank accession number AC000134 (positions 1-321) and analysed for additional coding 
sequence 5' to the putative initiation codon (nt 681-683) (Fig. 16). An additional in-frame ATG 
occurs at position nt 495-497 when the alternatively splice exon (position nt 504-609) is present 
(also shown in Fig. 13(a)). This closely matches the Kozak consensus. When this exon is 

20 absent, then the ATG is not in-frame and other possible initiation codons are absent (resulting 
translation shown in lower case lettering) (also shown in Fig. 13(b)). Further evidence that the 
initiation codon at position nt 681-683 is the true initiation site is given in Figure 15. 

Alignment of human and a partial murine mcg7 cDNA sequences is shown in Figure 15. The 
25 putative initiation codon is at position nt 360-362. Both murine ESTs appear to have an 
upstream in-frame stop codon at position nt 326-328, downstream of the differentially spliced 
exon and the sequence alignment thus suggests that this region represents the 5' UTR of mcg7. 

Furthermore, similarity with the C elegans homologue strongly suggest that the ATG codon at 
30 position nt 360-362 encodes the N-terminus of MCG7. 
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EXAMPLE 13 

Figure 17 shows data from experiments indicating that a truncated version of MCG7 when 
expressed as a GST fusion protein (construct B in Fig. 18) can function as a Ras-guanine 
5 nucleotide exchange factor. In brief, Ras (unprocessed and as a GST fusion protein) is loaded 
with 3 H-GDP then incubated in the presence of excess cold GTP ± GST-MCG7. Full details of 
this assay can be found in Porfiri et al 

EXAMPLE 14 

10 

Nucleotide sequence data generated from cosmid clone cSRL-20hl2 with the T7 primer 
(Promega, and Applied Biosystems Incorporated dye terminator sequencing kit) were aligned 
to the GenBank Expressed Sequence Tag (EST) database using the program BLASTN (Altschul 
etal, 1990) and was found to match GenBank entries T78563 (clone 1 13434) TO9103 (clone 
15 HIBBP12) and AA035643 (clone 471819). EST clones 11 3434 and 47 18 19 were obtained from 
Genome Systems Inc. and these DNAs were sequenced on both strands with gene-specific 
primers (Table 5) to generate the cDNA sequence of mcgl shown in Figures 13(a) and (b). 

The cDNA sequence of mcgl was translated in all possible reading frames and compared to the 
20 GenBank non-redundant protein database using the program BLASTX (Altschul et al 1990) and 
the coding region was assigned on the basis of showing homology to the C. elegans protein 
F25B3.3 (Figure 14). The mcgl cDNA composite was suspected to contain a single nucleotide 
error that originated from clone 471819 and the correct nucleotide sequence was, therefore, 
sought by reverse transcription-polymerase chain reaction (RT-PCR) of the cDNA fragment 
25 from a human cDNA pool. Total RNA was extracted from a human lymphoblastoid cell line 
using an RNeasy Mini Kit (Qiagen). cDNA synthesis was conducted with the reverse 
transcriptase Superscript II RNaseH- (GIBCO, BRL) and random hexamers using the procedure 
recommended by the manufacturer (GIBCO, BRL). One fortieth of the cDNA mix was 
subjected to 35 cycles of PCR using the following cycling conditions: 94°C for 30 seconds, 58°C 
30 for 30 seconds and 72°C for 90 seconds. The 50//1 reaction mix consisted of lx reaction buffer 
(Dade Scientific), 2mM dNTP mix, 20pmol of primers (see Table 6) MCG7UF (within the 
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variably spliced exon of Figure 13(b), between nucleotide positions 184-201) and SGCADRV2 
(between nucleotide positions 866-846 of Figure 13(a)) and 10 units of Dynazyme (Dade 
Scientific). The resulting PCR product was cloned into the pGEM-T vector (Promega) using 
standard methodology and sequenced using gene-specific primers. The correct nucleotide 
5 sequence of mcgl (as shown in Figure 13(a)) matches that of the recently release GenBank entry 
Y12336. A partial mouse mcgl cDNA sequence can also be found in GenBank entry Y12339. 

EXAMPLE 15 

10 The coding sequence of mcgl was cloned into vectors for expression in both bacterial and 
mammalian cells. In addition to the full-length constructs, the deletion constructs shown in 
Figure 18 were designed to retain the guanine nucleotide exchange (GEF) domain. For 
prokaryotic expression, the mcgl coding region was inserted downstream of and in-frame with 
the Sj26 cassette of the pGEX (Pharmacia) series of vectors (Smith and Johnson, 1988) using 

15 standard cloning techniques (Sambrook et al y 1989). For mammalian expression, the mcgl 
coding sequence was first myc-tagged at the N-terminus and then ligated into the expression 
vector pc Exv-n using standard cloning techniques. Ligation junctions of the constructs were 
sequences as the cloning strategies inadvertently changed or introduced additional amino acids 
as shown below. 

20 

Construct (A): EST clone 1 13434 was digested with Apal (Figure 13(a), nucleotide positions 
1022 to >2416 (within the vector)), blunt-ended with T4 DNA polymerase according to the 
specifications of the manufacturer (New England Biolab) and ligated into the Smal site of pGEX- 
3X. 

25 

Sequence of the pGEX and mcgl (underlined) junction: 
pGEX-3X mcg7(1022) 
Sj26 ... GGG ATC CCC CTG GTC [SEQ ID NO: 19] 

additional amino acids Gly lie Pro 

30 

Construct (B): EST clone 113434 was digested with EcoKL (Figure 13(a), nucleotide 
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positions <695 (within the vector) to 171 1) and ligated into the EcoRI site of pGEX- 1 . 

Sequence of the pGEX and mcgl (underlined) junction: 
pGEX-1 mcgl (695) 

5 Sj26 ... GAA TTC GGC ACG AG C CGA CGG [SEQ ID NO:20] 

additional amino acids Glu Phe Gly Thr Ser 

Construct (C): full-length mcgl: The pGEM-T clone containing the 5' end of the mcgl coding 
region was digested with Apal (subsequently blunt-ended with T4 DNA polymerase) and BstXl 
10 to liberate the fragment between nucleotide positions 336 and 830 of Figure 13(a). Clone 
1 13434 was digested with BstXl and Hindlll (vector derived) to liberate a fragment between 
nucleotide positions 830 > and 2416 (vector derived) of Figure 13(a). A pGEM-1 lzf vector 
(Promega) containing the myc-tag was digested with Apal (subsequently blunt-ended with T4 
DNA polymerase) and HindXR, and ligated with the 2 inserts described above. 

15 

Sequence of the myc-tag/mcg7 junction [SEQ ID NOs:21/22]: 

myc-tag vector BariHI mcgl 5' UTR (337) start 

ATGGAGCAGAAGCTGATCTCCGAGGAGGACCTG CCCGGGGCAGCTggatccG CAGCCCACCCCGCGCCGGCGGCCATG 
20MEQKL ISEEDL PGAA GS AAHPAPAAM 

additional amino acids 

The myc- tagged full-length mcgl insert in pGEM-1 lzf was then excised with Sacl and HindW 
(both vector derived) and directionally cloned into the mammalian expression vector pEXV 
25 (Beranger <?/ a/, 1994). 

Construct (D): Construct (C) in pGEM-1 lzf was sequentially digested with HindUl (this site 
was subsequently blunt-ended with T4 DNA polymerase) then flamHI, and ligated into pGEX- 
2T digested with BamHl and Smal. Digestion with BamUl, and ligated into pGEX-2T digested 
30 with BamUl and Smal. Digestion with BamKl removed the myc-tag of Construct (C). 

Sequence of the pGEX and mcgl [SEQ ID NO:23/24] (underlined) junction: 
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pGEX-2 BamHI meg 7 (337) 

Sj26 ... gga tec QQh GCC CA C CCC GCG CCG GCG GCC ATG 
Gly Ser Ala Ala His Pro Ala Pro Ala Ala Met 
: additional amino acids 



EXAMPLE 16 



Overnight bacterial cultures containing the pGEX plasmid were used to inoculate 500ml of Luria 
Broth media containing 50^g/ml ampicillin. The cultures were grown to an OD of -0.8 and then 

10 induced with ImM of IPTG for up to 3 hours at 37°C. The bacteria were pelleted and 
resuspended in 15 ml of STE buffer (lOmM Tris pH 8.0, 150 mM NaCl and ImM EDTA) with 
1 mg/ml lysozyme. The mixture was left on ice for more than 1 hour and subsequent steps were 
performed at 4°C. Protease inhibitors aprotinin, pepstatin and leupeptin were added at final 
concentrations of 25//g/ml, prior to the addition of Triton-X-100 (2% v/v final) and n-lauroyl 

15 sarcosine (1.5% w/v final). The lysate was sonicated for ~1 minute and pelleted at 14,000 x g 
for 15 minutes. 100 //l of 50% w/v glutathione-sephadex bead slurry (in PBS) was added per 
ml of supernatant. Following a 30 minute incubation at 4°C, the beads were washed three times 
with NETN (20mM Tris-HCl pH 8.0, lOOmM NaCl, ImM EDTA, 0.5% NP40), once with 
NETN-HS (equivalent to NETN but with 1M NaCl), and once in NETN. The bound protein 

20 was directly analysed by SDS-polyacrylamide gel electrophoresis (PAGE) as described below 
or the bound protein was eluted from the beads with the following elution buffer (50mM Tris pH 
8.0, 150mM NaCl, 5mM MgCl 2 , ImM DTT, lOmM reduced glutathione) for use in GDP release 
assays. 



25 



EXAMPLE 17 



Twenty microlitres of GST-sepharose-bound MCG7 were added to an equal volume of 2 x 
30 sample loading dye (lOOmM Tris pH6.8, 2% v/v mercaptoethanol, 4% w/v SDS, 0.2% w/v 
bromophenol blue, 20% v/v glycerol), boiled for 5 min and loaded onto a 7.5% w/v SDS-PAGE 
gel (Sambrook et al, 1989). The Coomassie brilliant blue stained gel (Sambrook et al, 1989) 
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typically displayed a protein doublet, running between 87-95 kDa consisting of the MCG7-GST 
fusion and a slightly smaller, co-purified contaminating E. coli protein of ~105kDa. The 
calculated molecular weight of full-length MCG7 is 77.5 kDa (Construct (D)) and the GST 
component has a molecular weight of 26kDa, hence, the recombinant protein runs slightly 
5 smaller than predicted. A Western blot of the same gel probed with anti-GST antibody yields 
an MCG7-specific band at the same position as that of the stained gel. 

EXAMPLE 18 

10 Assumptions: (a) GST-Ras molecular weight = 50 kD; (b) Concentration of GST-Ras solution 
= Img/ml = 20yM\ (c) [ 3 H]-GDP is lmCi/ml and 13.3Ci/mmol, therefoife [ H]-GDP 
concentration = 75 ^M and lpmol [ 3 H]-GDP= 15,466 cpm; (d) Elution buffer = Buffer E = 20 
mM Tris-Cl, pH7.5; 50mM NaCl; 5mM MgCl 2 ; ImM DTT (added just before use). Buffer E 
+ BSA= Buffer E+lmg/ml BSA (added just before use). 

15 

Mix together, in the following order and mix well after each addition: 
10//1 (=10//g) GST-Ras (@lmg/ml in Buffer E), 463//1 Buffer E + BSA, 7/zl [ 3 H]-GDP, 10ml 
490 /jM EDTA. Incubate @ RT for 10 min. Add 10//1 0.5 M MgCl 2 and mix well. Incubate 
@ RT for 10 min. Place on ice. During the first incubation the excess EDTA concentration is 
20 5mM, during the second incubation the excess Mg concentration is 5mM. The [ 3 H]-GDP 
concentration is IfjM and the final concentration of GST-Ras is 400nM. Thus 20ml of the final 
mix will contain 8pmol of GST-Ras protein. Specific activity of GDP is 15,446 cpm/pmol x 
(1/1.4)= 11,047 cpm/pmol. 

25 EXAMPLE 19 

Exchange Ras with labelled GDP as above. Add unlabelled GTP (stock = lOOrnM, pH7) to 1 
mM. Adjust Mg concentration by adding 5^1 0.5 EDTA to labelled Ras, 5jul 0.5M EDTA to 
500/il MCG7, and 5//1 0.5M EDTA to 500^/1 Buffer E + BSA. On ice set up microfuge tubes 
30 with 40^1 Ras-GDP (in triplicate) with 40/zl MCG7 or Buffer E + BSA (control). Transfer tubes 
to heat block @ 25°C and incubate for 10, 20 or 30 min. Stop exchange reactions with 1ml of 
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ice cold buffer E and place on ice. Pre-soak nitrocellulose filters, pore size 45/im, in Buffer E. 
Assemble the vacuum manifold apparatus (Millipore) with wet filters and plug the wells with 
rubber bunds. Switch on the vacuum pump. Remove the first plug, aliquot the sample and once 
it has been sucked through, wash the filter with 10ml of ice cold Buffer E. Remove next plug 
5 etc and continue round the manifold. Take manifold apart. Pin the filters to a pin board reserved 
for [ 3 H]. Air dry. Take up in 4ml scintillation fluid and count. These studies have been carried 
out with a truncated MCG7-GST fusion protein (amino acids 341 of Figure 13a to stop encoded 
within construct B). 

10 EXAMPLE 20 

A human gene was identified from chromosome 1 lql3 that encodes a new member of the DnaJ 
family of proteins (designated MCG18). This gene (mcgl8) is expressed as an ~1.4kb mRNA 
(Fig. 28) and is predicted to encode a 241 amino acid product (Fig. 19). 

15 

EXAMPLE 21 

MCG18 has partial homology to E. coli dnaJ and other human DnaJ family members in that it 
contains the J domain (Fig. 20). 

20 

EXAMPLE 22 

MCG18 has greatest homology to functionally undefined proteins from C elegans (Fig. 21) and 
S. pombe (Fig. 22) that also feature the J domain but maintain sequence similarity through the 
25 central and C-terminal regions of the proteins. 

EXAMPLE 23 



The J domain is proposed to mediate interaction with heat shock protein (Hsp70) 70 and consist 
30 of some 70 amino acids, frequently located at the N-terminus of the protein. One of these 
proteins, tumorous imaginal discs (Tid58) from Drosophila virilis (Fig. 23) functions as a 
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tumour suppressor. 

EXAMPLE 24 

5 A comparison of homology between MCG18 and human DnaJ proteins HDJ-2/H5DJ, HDJ- 
1/HSP40 and HSJ1 is shown in Fig. 24. 

EXAMPLE 25 

10 During the sequence characterisation of the VRF/VEGFB promoter region on cosmid CLGW4 
[Grimmond et al 1996], which maps to chromosome 1 lq 13 the inventors identified a sequence 
that exactly matched numerous human and mouse expressed sequence tags (ESTs) in the EST 
database from a gene which we designated mcgl8. EST clones for human (GenBank accession 
number T69741, clone 108172; accession number H40901, clone 177008) and mouse mcgl8 

15 (accession number W34884, clone 350966; accession number W64183, clone 385535) were 
obtained from Genome Systems Inc. and sequenced with the gene-specific primers shown in 
Table 7. The EST clones listed in Table 8 were also utilised in generating the full-length coding 
sequence for human (Figure 19) and mouse (Figure 25) mcgl8. The EST database also 
contained meg 18 cDNA entries that were alternately (or partially) spliced, and in order to 

20 understand their ability to encode new polypeptides, the gene structure of mcgl8 was determined 
-by sequencing human and mouse genomic templates with gene-specific primers. 

Genomic fragments containing the human [Grimmond et al, 1996] and murine genes [Townson 
et al, 1996] have been previously reported. Cosmid CLGW4 contains the entire human gene 

25 and A 121 contains the entire mouse gene, as determined by direct sequencing of the templates 
with the oligonucleotides listed in Table 7. Plasmids containing sub-fragments of A 121 and 
cosmid CLGW4 were prepared using plasmid purification kits (Qiagen) and sequenced as 
described previously [Grimmond et al, 1996; Townson et al, 1996] using primers designed 
against cDNA and genomic sequences. The BLAST suite of programs [Altschul et al, 1990] 

30 was used to compare the sequence data against the nucleotide and protein databases at the 
National Center for Biotechnology Information (http//www.ncbi.nih.gov.nlm). The sequence 
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data were compiled using Mac Vector 4.2.1 software (IBI-Kodak). ClustalW sequence 
alignments [Thompson et al, 1994] were conducted using the Australian National Genome 
Information Service computer faculty at the University of Sydney, Australia. 

5 The cDNA sequence of human mcgl8 (Figure 19) was translated in all possible reading frames 
and compared to the GenBank non-redundant protein database using the program BLASTX 
[Altschul et al, 1990] and the coding region was identified on the basis of showing homology to 
the DnaJ family of proteins (Figure 20). The DnaJ domain is encoded within the longest open 
reading frame and the assigned initiation codon is preceded by an in-frame stop codon (Figure 

10 27). Similar database search results were obtained for the mouse mcgl8 cDNA, and the 
alignment of human and mouse protein sequences is shown in Figure 26. MCG18 has greatest 
homology to gene products from C. elegans (Figure 21) and 5. pombe (Figure 22). Although 
it shares a similar J-domain, MCG18 does not contain other domains described for the tumour 
suppressor gene from D. virilis (Figure 23), nor is it a homologue of other reported human J- 

15 domain-containing proteins (Figure 24). 

To determine the expression pattern of meg 18, 15//g of total cellular RNA (RNeasy Mini Kit, 
Qiagen) from various human cell lines grown in culture were electrophoresed through 1 .2% 
MOPS/formaldehyde gels and blotted onto nylon membranes (Amersham) by capillary transfer 
20 using 20 x SSC (Sambrook et al, 1986). Filters were subsequentiy UV-fixed and hybridised 
overnight at 65°C to a radiolabelled ( 32 P-dCTP) cDNA probe (Church and Gilbert, 1984) for 
meg 18. After washes in 0.1 x SSC/0.1% w/v SDS for 65°C for 1 hour, the filters were air-dried 
and exposed to X-ray film. This Northern analysis showed that mcgl8 is expressed as a 1.4kb 
message in numerous tissues including breast, ovary, bladder, lung and keratinocytes (Figure 28). 
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TABLE4 
ESTs matching mcg4 



accession number seq. run organism 

gb|AA399110|AA399110 zt89e06.sl Soares testis NHT Homo sa... 

gb|N39612 |N39612 yy51g06.sl Homo sapiens cDNA clone 2... 

gb|AA514406|AA514406 nf57d01.sl NCI_CGAP_Co3 Homo sapiens... 

gb|AA544946|AA544946 vk38e02:rl Soares mouse mammary glan. . . 

gb|AA450076|AA450076 zx42a04.sl Soares total fetus Nb2HF8 . 

gb|AA535731|AAS35731 nf88f07.sl NCI_CGAP_Co3 Homo sapiens... 

gb|W79710|W79710 zd86f01.rl Soares fetal heart NbHH19 . . . 

gb|AA503531|AA503531 ne47e08.sl NCI_CGAP_Co3 Homo sapiens... 

gb AA45O132|AA450132 zx42a04 . rl Soares total fetus Nb2HF8 . . . 

gb AA39B068|AA398068 zt89f06.rl Soares testis NHT Homo sa. . . 
gb W60405|W60405 zd29h08.rl Soares fetal heart NbHH19... 
gb W81382|W81382 zd86f01.sl Soares fetal heart NbHH19. . . 
gb AA047617|AA047617 zf!3f07.sl Soares fetal heart NbHH19. . . 
gb AA282175|AA282175 zt02d03.sl NCI_CGAP_GCB1 Homo sapien. . . 
gb AA242159|AA242159 my30d04.rl Barstead mouse pooled org. . . 
gb AA068680|AA068680 mm61a05.rl Stratagene mouse embryoni. . . 
gb W46766 |W46766 zc36b07.sl Soares senescent fibrobla. . . 

gb N93704|N93704 zb51c04.sl Soares fetal lung NbHLl 9W . . . 

gb|AA15521O|AA155210 mr98e01.rl Stratagene mouse embryoni. . . 
gb|AA366022|AA366022 EST76915 Pineal gland II Homo sapien... 
gb|AA037691|AA037691 zk34hl2.sl Soares pregnant uterus Nb. . . 
gb|W35374|W35374 zc07h03.sl Soares parathyroid tumor ... 
dbj|C00696|C00696 HUMGS0008251, Human Gene Signature, ... 
gb|T98249|T98249 ye59a07.sl Homo sapiens cDNA clone 1... 
gb|W21588|W21588 zb51c04.rl Soares fetal lung NbHLl 9W. . . 

gb|H3217l|H32171 EST107015 Rattus sp. cDMA 5' end. 

gb|AA108092 (AA108092 mm89e06.rl Stratagene mouse embryoni... 
gb|AA017857 |AA017857 mh44dl0.rl Soares mouse placenta 4Nb. . . 
gb|AA037690|AA037690 zk34hl2.rl Soares pregnant uterus Nb. . . 
gb|AA531006|AA531006 nj07bll.sl NCI_CGAP_Pr22 Homo sapien... 
gb|N46760 N46760 yySlgOC.rl Homo sapiens cDNA clone 2... 
gb|W23584 W23584 zc71d03.sl Soares fetal heart NbHH19... 
gb|W42214 W42214 mc69h09.rl Soares mouse embryo NbMEl... 
gb|AA244877|AA244877 mx25a04.rl Soares mouse NML Mus muse . . . 
gb|W32939|W32939 zc07h03.rl Soares parathyroid tumor ... 
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TABLE5 

ESTs matching AA074703 (mc£4-related cDNA) 



Database; Non-redundant Database of GenBank EST Division 
1,222,625 sequences; 449,352,662 total letters. 

Smallest 
Sum 

High Probability 



Sequences producing 


High-scoring 


Segment Pairs: 


Score 


P 


(N) 




N 


accession number 


seq. run 


organism 


score 


! E 


value 


N 


gb|AA074703|AA074703 


zm76g07 


.rl 


Stratagene neuroepitheli. . . 
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-0e 
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TABLE 6 
meg 7- specific oligonucleotides 



name 


sequence (5' to 3') 


SEQ ID NOs. 


M1044R 


GGA CAA AGT GTG TGA TGA ACC 


SEQ ID NO:25 


MCG7-GEF-REV2 


CTC ATC CTC CGT CTG ATA CTG 


SEQIDNO:26 


M7R 


GTA GAT GTG GAT CAG CTT GG 


SEQ ID NO:27 


MCG7 CA FOR 


AGG TGG AGA ATG GTC AAGG 


SEQ ID NO:28 


MCG7-GEF-REV 


GTC ATA GTC TGT CTC CTA CT 


SEQ ID NO:29 


MCG7 GEFFOR 


ACA TAG ACA GCG TGC CTA CC 


SEQ ID NO:30 


MCG7-PKC-REV 


TAC AAC CTT AGG GAC ACC AG 


SEQIDNO:31 


MCG7-PKC-FOR 


TGC TGA GCC TGC TCA CGG TG 


SEQ ID NO:32 


T09103F 


CAA GTG AAC AGC ACG TCC 


SEQ ID NO:33 


M7F 


GAC TAT CTC AAG GAC CAG CTG 


SEQ ID NO:34 


MCG7UF 


GGT TCG GTC CGA GCC CGG 


SEQ ID NO:35 


SGCADRV2 


GGA GCG ATA CTC CAA GTA GGT 


SEQ ID NO:36 
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TABLE 7 

mcgiS-SPECIFIC OLIGONUCLEOTIDES 



name 


sequence 5' to 3' 


HVESTF 


AGC GGG CCA GGC CCC TTC [SEQ ID NO:37] 


HV195F 


CAT CCT GGT CCA ATG CGC TC [SEQ ID NO:38] 


HV387F2 


GCA CTG AGG AAG TTA AAC GAG C [SEQ ID NO:39] 


HV408R 


GCT CGT TTA ACT TCC TCA GTG C [SEQ ID NO:40] 


EXON1REV 


GCT CAG CTC CAC AAA GCG GCT [SEQ ID NO:41] 


HVEST426F 


ACC AGC TCC GCT CAG GTA G [SEQ ID NO:42] 


HVEST623R 


TCC AGG AGC TGT GTG TTT GG [SEQ ID NO:43] 


SGVESTF3 


CCA GTT TCA CAG CGT GAG G [SEQ ID NO:44] 


HVEST631R 


CAG CAT GAG GAG GAG GCA G [SEQ ID NO:45] 
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TABLE8 

EST CLONE SEQUENCES USED TO GENERATE HUMAN AND MOUSE 
mcgl8 cDNA SEQUENCE COMPOSITES 



EST clone number 


organism 


GenBank accession numhex 


lg2815 


human 


D45683 


0O1-T2-18 


human 


F 1 7225 


273748 


human 


N37043 


177008 


human 


H40901 and H40939 


258011 


human 


N30776 


276887 • 


human 


N44004 


108172 


human 


T69741 


307529 


human 


W2 1083 and W32579 


342027 


human 


W60283 


354288 


mouse 


W44038 


350966 


mouse 


W348844 


426261 


mouse 


AA002868 


368185 


mouse 


W5391 1 


385535 


mouse 


W64183 


404472 


mouse 


W82959 


406437 


mouse 


W83482 
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SEQUENCE LISTING 



( 1 ) GENERAL INFORMATION: 

(i) APPLICANT: (OTHER THAN US): The Council of The Queensland Institute of 

Medical Research 

(US ONLY): HAYWARD Nicholas, SILINS Ginters, GRIMMOND Sean, 
GARTSIDE Michael and HANCOCK, John 

(ii) TITLE OF INVENTIONS NOVEL GENE AND USES THEREFOR 

(iii) NUMBER OF SEQUENCES: 45 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: DA VIES COLLISON CAVE 

(B) STREET: 1 LITTLE COLLINS STREET 

(C) CITY: MELBOURNE 

(D) STATE: VICTORIA 

(E) COUNTRY: AUSTRALIA 

(F) ZIP: 3000 

(v) COMPUTER READABLE FORM: 

(A) MEDRJM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1 .0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT INTERNATIONAL 

(B) FILING DATE: 22-MAY-1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: P06973 

(B) FDUNG DATE: 23-MAY- 1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: P06974 

(B) FILING DATE: 23-MAY- 1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: P06972 

(B) FILING DATE: 23-MAY- 1997 
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(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PP1459 

(B) FILING DATE: 22-JAN-1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PP1460 

(B) FILING DATE: 22-JAN-1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PP1458 

(B) FILING DATE: 22-JAN-1998 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

Cys Xaa Xaa Cys Xaa Gly Xaa Gly 

5 • 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 30.. 959 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



TCAGTAAACA CAGAGACTGG GGATCGATC ATG GGG CTT TGT AAG TGC CCC AAG 53 

Met Gly Leu Cys Lys Cys Pro Lys 
1 5 

AGA AAG GTG ACC AAC CTG TTC TGC TTC GAA CAT CGG GTC AAC GTC TGC 101 

Arg Lys Val Thr Asn Leu Phe Cys Phe Glu His Arg Val Asn Val Cys 
10 15 20 

GAG CAC TGC CTG GTA GCC AAT CAC GCC AAG TGC ATC GTC CAG TCC TAC 149 
Glu His Cys Leu Val Ala Asn His Ala Lys Cys lie Val Gin Ser Tyr 
25 30 35 40 

CTG CAA TGG CTC CAA GAT AGC GAC TAC AAC CCC AAT TGC CGC CTG TGC 197 
Leu Gin Trp Leu Gin Asp Ser Asp Tyr Asn Pro Asn Cys Arg Leu Cys 
45 50 55 

AAC ATA CCC CTG GCC AGC CGA GAG ACG ACC CGC CTT GTC TGC TAT GAT 245 
Asn lie Pro Leu Ala Ser Arg Glu Thr Thr Arg Leu Val Cys Tyr Asp 
60 65 70 

CTC TTT CAC TGG GCC TGC CTC AAT GAA CGT GCT GCC CAG CTA CCC CGA 293 
Leu Phe His Trp Ala Cys Leu Asn Glu Arg Ala Ala Gin Leu Pro Arg 
75 80 85 

AAC ACG GCA CCT GCC GGC TAT CAG TGC CCC AGC TGC AAT GGC CCC ATC 341 
Asn Thr Ala Pro Ala Gly Tyr Gin Cys Pro Ser Cys Asn Gly Pro lie 
90 95 100 

TTC CCC CCA ACC AAC CTG GCT GGC CCC GTG GCC TCC GCA CTG AGA GAG 389 
Phe Pro Pro Thr Asn Leu Ala Gly Pro Val Ala Ser Ala Leu Arg Glu 
105 110 115 120 
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AAG CTG 
Lys Leu 


GCC 
Ala 


ACA 
Thr 


GTC 
Val 


AAC 
Asn 


TGG 
Trp 


GCC 
Ala 


CGG 
Arg 


GCA 

Ala 
i m 

JL j U 


GGA 
Gly 


CTG 
Leu 


GGC 
Gly 


CTC 
Leu 


CCT 
Pro 

1 "1C 

1j J 


CTG 
Leu 


437 


ATC GAT 
lie Asp 


GAG 
Glu 


GTG 
Val 
140 


GTG 
Val 


AGC 
Ser 


CCA 
Pro 


GAG 
Glu 


CCC 
Pro 


GAG 
Glu 


CCC 
Pro 


CTC 
Leu 


AAC 
Asn 


ACG 
Thr 


TCT 
Ser 


GAC 
Asp 


485 


TTC TCT 
Phe Ser 


GAC 
Asp 

tec 
155 


TGG 
Trp 


TCT 
Ser 


AGT 
Ser 


TTT 
Phe 


AAT 
Asn 
1 bU 


GCC 
Ala 


AGC 
Ser 


AGT 
Ser 


ACC 
Thr 


CCT 
Pro 

ICR 


GGA 
Gly 


CCA 
Pro 


GAG 
Glu 


533 


GAG GTA 
Glu Val 
170 


GAC 
Asp 


AGC 
Ser 


GCC 
Ala 


TCT 
Ser 


GCT 
Ala 
175 


GCC 
Ala 


CCA 
Pro 


GCC 
Ala 


TTC 
Phe 


TAC 
Tyr 
ion 


AGC 
Ser 


CGA 
Arg 


GCC 
Ala 


CCC 
Pro 


581 


CGG CCC 
Arg Pro 

IOC 

185 


CCA 
Pro 


GCT 
Ala 


TCC 
Ser 


CCA 
Pro 


GGC 
Gly 


CGG 
Arg 


CCC 
Pro 


GAG 
Glu 


CAG 
Gin 


CAC 
His 


ACA 
Thr 


GTG 
Val 


ATC 
He 


CAC 
His 
z UU 


629 


ATG GGC 
Met Gly 


AAT 
Asn 


CCT 
Pro 


GAG 
Glu 
205 


CCC 
Pro 


TTG 
Leu 


ACT 
Thr 


CAC 
His 


GCC 
Ala 


CCT 
Pro 


AGG 
Arg 


AAG 
Lys 


GTG 
Val 


TAT 
Tyr 


GAT 
Asp 


677 


ACG CGG 
Thr Arg 


GAT 
Asp 


GAT 
Asp 
220 


GAC 
Asp 


CGG 
Arg 


ACA 
Thr 


CCA 
Pro 


GGC 
Gly 

o o c 


CTC 
Leu 


CAT 
His 


GGA 
Gly 


GAC 
Asp 


TGT 
Cys 


GAC 
Asp 


GAT 
Asp 


725 


GAC AAG 
Asp Lys 


TAC 
Tyr 
235 


CGA 
Arg 


CGT 
Arg 


CGG 
Arg 


CCG 
Pro 


GCC 
Ala 
240 


TTG 
Leu 


GGT 
Gly 


TGG 
Trp 


CTG 
Leu 


GCC 
Ala 


CGG 
Arg 


CTG 
Leu 


CTA 
Leu 


773 


AGG AGC 
Arg Ser 
250 


CGG 
Arg 


GCT 
Ala 


GGG 
Gly 


TCT 
Ser 


CGG 
Arg 

255 


AAG 
Lys 


CGG 
Arg 


CCG 
Pro 


CTG 
Leu 


ACC 
Thr 


CTG 
Leu 


CTC 
Leu 


CAG 
Gin 


CGG 
Arg 


821 


GCG GGG 
Ala Gly 
265 


CTG 
Leu 


CTG 
Leu 


CTA 
Leu 


CTC 
Leu 
270 


TTG 
Leu 


GGA 
Gly 


CTG 
Leu 


CTG 
Leu 


GGC 
Gly 
275 


TTC 
Phe 


CTG 
Leu 


GCC 
Ala 


CTC 
Leu 


CTT 
Leu 
280 


Q C Q 


GCC CTC 
Ala Leu 


ATG 
Met 


TCT 
Ser 


CGC 
Arg 
285 


CTA 
Leu 


GGC 
Gly 


CGG 
Arg 


GCC 
Ala 


GCA 
Ala 
290 


GCT 
Ala 


GAC 
Asp 


AGC 
Ser 


GAT 
Asp 


CCC 
Pro 
295 


AAC 
Asn 


yj. / 


CTG GAC 
Leu Asp 


CCA 
Pro 


CTC 
Leu 
300 


ATG 
Met 


AAC 
Asn 


CCT 
Pro 


CAC 
His 


ATC 
lie 
305 


CGC 
Arg 


GTG 
Val 


GGC 
Gly 


CCC 
Pro 


TCC 
Ser 
310 


TGA 
* 




962 


GCCCCCTTGC 1 


TTGTGGCTAG GCCAGCCTAG GATGTGGGTT 


CTGTGGAGGA GAGGCGGGGT 


1022 


AATGGGGAGG ' 


CTGAGGGCAC CTCTTCACTG CCCCTCTCCC 


TCAAGCCTAA GACACTAAGA 


1082 


CCCCAGACCC , 


AAAGCCAAGT CCACCAGAGT GGCTCGCAGG 


CCAGGCCTGG AGTCCCCGTG 


1142 


GGTCAAGCAT 


TTGTCTTGAC TTGCTTTCTC CCGGGTCTCC 


AGCCTCCGAC « 


CCCTCGCCCC 


1202 


ATGAAGGAGC 


TGGCAGGTGG AAATAAACAA CAACTTTATT 












1242 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 310 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Gly Leu Cys Lys Cys Pro Lys Arg Lys Val Thr Asn Leu Phe Cys 
1 5 10 15 

Phe Glu His Arg Val Asn Val Cys Glu His Cys Leu Val Ala Asn His 
20 25 30 

Ala Lys Cys lie Val Gin Ser Tyr Leu Gin Trp Leu Gin Asp Ser Asp 
35 40 45 

Tyr Asn Pro Asn Cys Arg Leu Cys Asn lie Pro Leu Ala Ser Arg Glu 

50 55 ,60 

Thr Thr Arg Leu Val Cys Tyr Asp Leu Phe His Trp Ala Cys Leu Asn 
65 70 75 80 

Glu Arg Ala Ala Gin Leu Pro Arg Asn Thr Ala Pro Ala Gly Tyr Gin 
85 90 95 

Cys Pro Ser Cys Asn Gly Pro lie Phe Pro Pro Thr Asn Leu Ala Gly 
100 105 110 

Pro Val Ala Ser Ala Leu Arg Glu Lys Leu Ala Thr Val Asn Trp Ala 
115 120 125 

Arg Ala Gly Leu Gly Leu Pro Leu lie Asp Glu Val Val Ser Pro Glu 
130 135 140 

Pro Glu Pro Leu Asn Thr Ser Asp Phe Ser Asp Trp Ser Ser Phe Asn 
145 150 155 160 

Ala Ser Ser Thr Pro Gly Pro Glu Glu Val Asp Ser Ala Ser Ala Ala 
165 170 175 

Pro Ala Phe Tyr Ser Arg Ala Pro Arg Pro Pro Ala Ser Pro Gly Arg 
180 185 190 

Pro Glu Gin His Thr Val lie His Met Gly Asn Pro Glu Pro Leu Thr 
195 200 205 

His Ala Pro Arg Lys Val Tyr Asp Thr Arg Asp Asp Asp Arg Thr Pro 
210 215 220 

Gly Leu His Gly Asp Cys Asp Asp Asp Lys Tyr Arg Arg Arg Pro Ala 
225 230 235 240 

Leu Gly Trp Leu Ala Arg Leu Leu Arg Ser Arg Ala Gly Ser Arg Lys 
245 250 255 

Arg Pro Leu Thr Leu Leu Gin Arg Ala Gly Leu Leu Leu Leu Leu Gly 
260 265 270 

Leu Leu Gly Phe Leu Ala Leu Leu Ala Leu Met Ser Arg Leu Gly Arg 
275 280 285 

Ala Ala Ala Asp Ser Asp Pro Asn Leu Asp Pro Leu Met Asn Pro His 
290 295 300 

lie Arg Val Gly Pro Ser 
305 310 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 
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( A) LENGTH: 2415 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 3. .2188 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

CG ATT TCA TTC CTC GCT CCC CAC AGG TCC CTC TCC CCA AAA TAT TCC 47 
He Ser Phe Leu Ala Pro His Arg Ser Leu Ser Pro Lys Tyr Ser 
15 10 15 

CAT CTT GTC CTA GCC CAT CCC CCA GAC TAT CTC AAG GAC CAG CTG TCC 95 
His Leu Val Leu Ala His Pro Pro Asp Tyr Leu Lys Asp Gin Leu Ser 
20 25 30 

CCA CGC CCC CGA CCT CCA CTA GGC CTG TGC CAC CCG CTG CCT GCA GGA 143 
Pro Arg Pro Arg Pro Pro Leu Gly Leu Cys His Pro Leu Pro Ala Gly 
35 40 45 

AGA CGC CCG GTC CCG GGC CGG GTT AGC CCC ATG GGA ACG CAG CGC CTG 191 
Arg Arg Pro Val Pro Gly Arg Val Ser Pro Met Gly Thr Gin Arg Leu 
50 55 60 

TGT GGC CGC GGG ACT CAA GGC TGG CCT GGC TCA AGT GAA CAG CAC GTC 239 
Cys Gly Arg Gly Thr Gin Gly Trp Pro Gly Ser Ser Glu Gin His Val 
65 70 75 

CAG GAG GCG ACC TCG TCC GCG GGT TTG CAT TCT GGG GTG GAC GAG CTG 287 
Gin Glu. Ala Thr Ser Ser Ala Gly Leu His Ser Gly Val Asp Glu Leu 
80 85 90 95 

GGG GTT CGG TCC GAG CCC GGT GGG AGG CTC CCG GAG CGC AGC CTG GGC 33 5 

Glv Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser Leu Gly 
100 105 HO 

CCA GCC CAC CCC GCG CCG GCG GCC ATG GCA GGC ACC CTG GAC CTG GAC 383 
Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp Leu Asp 
115 120 125 

AAG GGC TGC ACG GTG GAG GAG CTG CTC CGC GGG TGC ATC GAA GCC TTC 431 
Lys Gly Cys Thr Val Glu Glu Leu Leu Arg Gly Cys He Glu Ala Phe 
130 135 140 

GAT GAC TCC GGG AAG GTG CGG GAC CCG CAG CTG GTG CGC ATG TTC CTC 479 
Asp Asp Ser Gly Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu 
145 150 155 

ATG ATG CAC CCC TGG TAC ATC CCC TCC TCT CAG CTG GCG GCC AAG CTG 527 
Met Met His Pro Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu 
160 165 170 175 

CTC CAC ATC TAC CAA CAA TCC CGG AAG GAC AAC TCC AAT TCC CTG CAG 575 
Leu His He Tyr Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin 
180 185 190 

GTG AAA ACG TGC CAC CTG GTC AGG TAC TGG ATC TCC GCC TTC CCA GCG 623 
Val Lys Thr Cys His Leu Val Arg Tyr Trp lie Ser Ala Phe Pro Ala 
195 200 205 
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GAG TTT GAC TTG AAC CCG GAG TTG GCT GAG CAG ATC AAG GAG CTG AAG 671 
Glu Phe Asp Leu Asn Pro Glu Leu Ala Glu Gin lie Lys Glu Leu Lys 
210 215 220 

GCT CTG CTA GAC CAA GAA GGG AAC CGA CGG CAC AGC AGC CTA ATC GAC 719 
Ala Leu Leu Asp Gin Glu Gly Asn Arg Arg His Ser Ser Leu lie Asp 
225 230 235 

ATA GAC AGC GTC CCT ACC TAC AAG TGG AAG CGG CAG GTG ACT CAG CGG 767 
lie Asp Ser Val Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg 
240 245 250 255 

AAC CCT GTG GGA CAG AAA AAG CGC AAG ATG TCC CTG TTG TTT GAC CAC 815 
Asn Pro Val Gly Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His 
260 265 270 

CTG GAG CCC ATG GAG CTG GCG GAG CAT CTC ACC TAC TTG GAG TAT CGC 863 
Leu Glu Pro Met Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg 
275 280 285 

TCC TTC TGC AAG ATC CTG TTT CAG GAC TAT CAC AGT TTC GTG ACT CAT 911 
Ser Phe Cys Lys lie Leu Phe Gin Asp Tyr His Ser Phe Val Thr His 
290 295 300 

GGC TGC ACT GTG GAC AAC CCC GTC CTG GAG CGG TTC ATC TCC CTC TTC 959 
Gly Cys Thr Val Asp Asn Pro Val Leu Glu Arg Phe lie Ser Leu Phe 
305 310 315 

AAC AGC GTC TCA CAG TGG GTG CAG CTC ATG ATC CTC AGC AAA CCC ACA 1007 
Asn Ser Val Ser Gin Trp Val Gin Leu Met lie Leu Ser Lys Pro Thr 
320 325 330 335 

GCC CCG CAG CGG GCC CTG GTC ATC ACA CAC TTT GTC CAC GTG GCG GAG 1055 
Ala Pro Gin Arg Ala Leu Val lie Thr His Phe Val His Val Ala Glu 
340 345 350 

AAG CTG CTA CAG CTG CAG AAC TTC AAC ACG CTG ATG GCA GTG GTC GGG 1103 
Lys Leu Leu Gin Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly 
355 360 365 

GGC CTG AGC CAC AGC TCC ATC TCC CGC CTC AAG GAG ACC CAC AGC CAC 1151 
Gly Leu Ser His Ser Ser lie Ser Arg Leu Lys Glu Thr His Ser His 
370 375 380 

GTT AGC CCT GAG ACC ATC AAG CTC TGG GAG GGT CTC ACG GAA CTA GTG 1199 
Val Ser Pro Glu Thr lie Lys Leu Trp Glu Gly Leu Thr Glu Leu Val 
385 390 395 

ACG GCG ACA GGC AAC TAT GGC AAC TAC CGG CGT CGp CTG GCA GCC TGT 1247 
Thr Ala Thr Gly Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys 
400 405 410 415 

GTG GGC TTC CGC TTC CCG ATC CTG GGT GTG CAC CTC AAG GAC CTG GTG 1295 
Val Gly Phe Arg Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val 
420 425 430 

GCC CTG CAG CTG GCA CTG CCT GAC TGG CTG GAC CCA GCC CGG ACC CGG 1343 
Ala Leu Gin Leu Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg 
435 440 445 

CTC AAC GGG GCC AAG ATG AAG CAG CTC TTT AGC ATC CTG GAG GAG CTG 1391 
Leu Asn Gly Ala Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu 
450 455 460 

GCC ATG GTG ACC AGC CTG CGG CCA CCA GTA CAG GCC AAC CCC GAC CTG 1439 
Ala Met Val Thr Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu 
465 470 475 
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CTG AGC CTG CTC ACG GTG TCT CTG GAT CAG TAT CAG ACG GAG GAT GAG 1487 
Leu Ser Leu Leu Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu 
480 485 490 495 

CTG TAC CAG CTG TCC CTG CAG CGG GAG CCG CGC TCC AAG TCC TCG CCA 1535 
Leu Tyr Gin Leu Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro 
500 505 510 

ACC AGC CCC ACG AGT TGC ACC CCA CCA CCC CGG CCC CCG GTA CTG GAG 1583 
Thr Ser Pro Thr Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu 
515 ~ 520 525 

GAG TGG ACC TCG GCT GCC AAA CCC AAG CTG GAT CAG GCC CTC GTG GTG 1631 
Glu Trp Thr Ser Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val 
530 535 540 

GAG CAC ATC GAG AAG ATG GTG GAG TCT GTG TTC CGG AAC TTT GAC GTC 1679 
Glu His lie Glu Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val 
545 550 555 

GAT GGG GAT GGC CAC ATC TCA CAG GAA GAA TTC CAG ATC ATC CGT GGG 1727 
Asp Gly Asp Gly His He Ser Gin Glu Glu Phe Gin He He Arg Gly 
560 565 570 575 

AAC TTC CCT TAC CTC AGC GCC TTT GGG GAC CTC GAC CAG AAC CAG GAT 1775 
Asn Phe Pro Tyr Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp 
580 585 590 

GGC TGC ATC AGC AGG GAG GAG ATG GTT TCC TAT TTC CTG CGC TCC AGC 1823 
Gly Cys He Ser Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser 
595 600 605 

TCT GTG TTG GGG GGG CGC ATG GGC TTC GTA CAC AAC TTC CAG GAG AGC 1871 
Ser Val Leu Gly Gly Arg Met Gly Phe Val His Asn Phe Gin Glu Ser 
610 615 620 

AAC TCC TTG CGC CCC GTC GCC TGC CGC CAC TGC AAA GCC CTG ATC CTG 1919 
Asn Ser Leu Arg Pro Val Ala Cys Arg His Cys Lys Ala Leu He Leu 
625 ~ 630 635 

GGC ATC TAC AAG CAG GGC CTC AAA TGC CGA GCC TGT GGA GTG AAC TGC 1967 
Gly He Tyr Lys Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys 
640 * 645 650 655 

CAC AAG CAG TGC AAG GAT CGC CTG TCA GTT GAG TGT CGG CGC AGG GCC 2 015 

His Lys Gin Cys Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala 
660 665 670 

CAG AGT GTG AGC CTG GAG GGG TCT GCA CCC TCA CCC TCA CCC ATG CAC 2063 
Gin Ser Val Ser Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His 
675 680 685 

AGC CAC CAT CAC CGC GCC TTC AGC TTC TCT CTG CCC CGC CCT GGC AGG 2111 
Ser His His His Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg 
690 695 700 

CGA GGC TCC AGG CCT CCA GAG ATC CGT GAG GAG GAG GTA CAG ACG GTG 2159 
Arg Gly Ser Arg Pro Pro Glu He Arg Glu Glu Glu Val Gin Thr Val 
705 710 715 

GAG GAT GGG GTG TTT GAC ATC CAC TTG TA ATAGATGCTG TGGTTGGATC 2208 
Glu Asp Gly Val Phe Asp He His Leu 
720 725 

AAGGACTCAT TCCTGCCTTG GAGAAAATAC TTCAACCAGA GCAGGGAGCC TGGGGGTGTC 22 68 
GGGGCAGGAG GCTGGGGATG GGGGTGGGAT ATGAGGGTGG CATGCAGCTG AGGGCAGGGC 2328 
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CAGGGCTGGT GTCCCTAAGG TTGTACAGAC TCTTGTGAAT ATTTGTATTT TCCAGATGGA 2388 
ATAAAAAGGC CCGTGTAATT AACCTTC 2415 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 728 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE- DESCRIPTION: SEQ ID NO: 5: 

lie Ser Phe Leu Ala Pro His Arg Ser Leu Ser Pro Lys Tyr Ser His 
1 5 10 . 15 

Leu Val Leu Ala His Pro Pro Asp Tyr Leu Lys Asp Gin Leu Ser Pro 
20 25 ' 30 

Arg Pro Arg Pro Pro Leu Gly Leu Cys His Pro Leu Pro Ala Gly Arg 
35 40 45 

Arg Pro Val Pro Gly Arg Val Ser Pro Met Gly Thr Gin Arg Leu Cys 
50 .55 60 

Gly Arg Gly Thr Gin Gly Trp Pro Gly Ser Ser Glu Gin His Val Gin 
65 70 75 80 

Glu Ala Thr Ser Ser Ala Gly Leu His Ser Gly Val Asp Glu Leu Gly 
85 90 95 

Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser Leu Gly Pro 
100 105 110 

Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp Leu Asp Lys 
115 120 125 

Gly Cys Thr Val Glu Glu Leu Leu Arg Gly Cys lie Glu Ala Phe Asp 
130 135 _ 140 

Asp Ser Gly Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu Met 
145 150 155 * 160 

Met His Pro Trp Tyr lie Pro Ser Ser Gin Leu Ala Ala Lys Leu Leu 
165 170 175 

His lie Tyr Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin Val 
180 185 190 

Lys Thr Cys His Leu Val Arg Tyr Trp lie Ser Ala Phe Pro Ala Glu 
195 200 205 

Phe Asp Leu Asn Pro Glu Leu Ala Glu Gin lie Lys Glu Leu Lys Ala 
210 215 220 

Leu Leu Asp Gin Glu Gly Asn Arg Arg His Ser Ser Leu lie Asp lie 
225 230 235 240 

Asp Ser Val Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg Asn 
245 250 255 

Pro Val Gly Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His Leu 
260 265 270 
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Glu Pro Met Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg Ser 
275 280 285 

Phe Cys Lys lie Leu Phe Gin Asp Tyr His Ser Phe Val Thr His Gly 
290 295 300 

Cys Thr Val Asp Asn Pro Val Leu Glu Arg Phe lie Ser Leu Phe Asn 
305 310 315 320 

Ser Val Ser Gin Trp Val Gin Leu Met He Leu Ser Lys Pro Thr Ala 
325 330 335 

Pro Gin Arg Ala Leu Val He Thr His Phe Val His Val Ala Glu Lys 
340 345 350 

Leu Leu Gin Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly Gly 
355 360 365 

Leu Ser His Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His Val 
370 375 380 

Ser Pro Glu Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val Thr 
385 390 395 400 

Ala Thr Gly Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys Val 
405 410 415 

Gly Phe Arg Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val Ala 
420 425 430 

Leu Gin Leu Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg Leu 
435 440 445 

Asn Gly Ala Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu Ala 
450 455 460 

Met Val Thr Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu Leu 
465 470 475 480 

Ser Leu Leu Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu Leu 
485 490 495 

Tyr Gin Leu Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro Thr 
500 505 510 

Ser Pro Thr Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu Glu 
515 520 525 

Trp Thr Ser Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val Glu 
530 535 540 

His He Glu Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val Asp 
545 550 555 560 

Gly Asp Gly His He Ser Gin Glu Glu Phe Gin He He Arg Gly Asn 
565 570 575 

Phe Pro Tyr Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp Gly 
580 585 590 



Cys He Ser Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser Ser 
595 ~ 600 605 

Val Leu Gly Gly Arg Met Gly Phe Val His Asn Phe Gin Glu Ser Asn 
610 ' 615 620 

Ser Leu Arg Pro Val Ala Cys Arg His Cys Lys Ala Leu He Leu Gly 
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625 630 635 640 

lie Tyr Lys Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys His 
645 650 655 

Lys Gin Cys Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala Gin 
660 665 670 

Ser Val Ser Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His Ser 
675 680 685 

His His His Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg Arg 
690 695 700 

Gly Ser Arg Pro Pro Glu lie Arg Glu Glu Glu Val Gin Thr Val Glu 
705 710 715 720 

Asp Gly Val Phe Asp lie His Leu 
725 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 09 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 254.. 2083 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 120 

TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCGGGTTAGC CCCATGGGAA 180 

CGGGGTTCGG TCCGAGCCCG GTGGGAGGCT CCCGGAGCGC AGCCTGGGCC CAGCCCACCC 240 

CGCGCCGGCG GCC ATG GCA GGC ACC CTG GAC CTG GAC AAG GGC TGC ACG 289 
Met Ala Gly Thr Leu Asp Leu Asp Lys Gly Cys Thr 
1 5 10 

GTG GAG GAG CTG CTC CGC GGG TGC ATC GAA GCC TTC GAT GAC TCC GGG 337 
Val Glu Glu Leu Leu Arg Gly Cys lie Glu Ala Phe Asp Asp Ser Gly 
15 20 25 

AAG GTG CGG GAC CCG CAG CTG GTG CGC ATG TTC CTC ATG ATG CAC CCC 385 
Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu Met Met His Pro 
30 35 40 

TGG TAC ATC CCC TCC TCT CAG CTG GCG GCC AAG CTG CTC CAC ATC TAC 433 
Trp Tyr lie Pro Ser Ser Gin Leu Ala Ala Lys Leu Leu His lie Tyr 
45 50 55 60 

CAA CAA TCC CGG AAG GAC AAC TCC AAT TCC CTG CAG GTG AAA ACG TGC 481 
Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin Val Lys Thr Cys 
65 70 75 

CAC CTG GTC AGG TAC TGG ATC TCC GCC TTC CCA GCG GAG TTT GAC TTG 529 
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His Leu Val Arg Tyr Trp He Ser Ala Phe Pro Ala Glu Phe Asp Leu 
80 " 85 90 

AAC CCG GAG TTG GCT GAG CAG ATC AAG GAG CTG AAG GCT CTG CTA GAC 577 
Asn Pro Glu Leu Ala Glu Gin He Lys Glu Leu Lys Ala Leu Leu Asp 
95 100 105 

CAA GAA GGG AAC CGA CGG CAC AGC AGC CTA ATC GAC ATA GAC AGC GTC 625 
Gin Glu Gly Asn Arg Arg His Ser Ser Leu He Asp He Asp Ser Val 
110 H5 120 

CCT ACC TAC AAG TGG AAG CGG CAG GTG ACT CAG CGG AAC CCT GTG GGA 673 
Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg Asn Pro Val Gly 
125 130 135 140 

CAG AAA AAG CGC AAG ATG TCC CTG TTG TTT GAC CAC CTG GAG CCC ATG 721 
Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His Leu Glu Pro Met 
145 150 155 

GAG CTG GCG GAG CAT CTC ACC TAC TTG GAG TAT CGC TCC TTC TGC AAG 769 
Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg Ser Phe Cys Lys 
160 165 170 

ATC CTG TTT CAG GAC TAT CAC AGT TTC GTG ACT CAT GGC TGC ACT GTG 817 
He Leu Phe Gin Asp Tyr His Ser Phe Val Thr His Gly Cys Thr Val 
175 180 185 

GAC AAC CCC GTC CTG GAG CGG TTC ATC TCC CTC TTC AAC AGC GTC TCA 865 
Asp Asn Pro Val Leu Glu Arg Phe He Ser Leu Phe Asn Ser Val Ser 
190 195 200 

CAG TGG GTG CAG CTC ATG ATC CTC AGC AAA CCC ACA GCC CCG CAG CGG 913 
Gin Trp Val Gin Leu Met He Leu Ser Lys Pro Thr Ala Pro Gin Arg 
205 210 215 220 

GCC CTG GTC ATC ACA CAC TTT GTC CAC GTG GCG GAG AAG CTG CTA CAG 961 
Ala Leu Val He Thr His Phe Val His Val Ala Glu Lys Leu Leu Gin 
225 230 235 

CTG CAG AAC TTC AAC ACG CTG ATG GCA GTG GTC GGG GGC CTG AGC CAC 1009 
Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly Gly Leu Ser His 
240 245 250 

AGC TCC ATC TCC CGC CTC AAG GAG ACC CAC AGC CAC GTT AGC CCT GAG 1057 
Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His Val Ser Pro Glu 
255 260 265 

ACC ATC AAG CTC TGG GAG GGT CTC ACG GAA CTA GTG ACG GCG ACA GGC 1105 
Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val Thr Ala Thr Gly 
270 275 280 

AAC TAT GGC AAC TAC CGG CGT CGG CTG GCA GCC TGT GTG GGC TTC CGC 1153 
Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys Val Gly Phe Arg 
285 290 295 300 

TTC CCG ATC CTG GGT GTG CAC CTC AAG GAC CTG GTG GCC CTG CAG CTG 1201 
Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val Ala Leu Gin Leu 
305 310 315 

GCA CTG CCT GAC TGG CTG GAC CCA GCC CGG ACC CGG CTC AAC GGG GCC 1249 
Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg Leu Asn Gly Ala 
320 325 330 

AAG ATG AAG CAG CTC TTT AGC ATC CTG GAG GAG CTG GCC ATG GTG ACC 1297 
Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu Ala Met Val Thr 
335 340 345 
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AGC CTG CGG CCA CCA GTA CAG GCC AAC CCC GAC CTG CTG AGC CTG CTC 1345 
Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu Leu Ser Leu Leu 
350 355 360 

ACG GTG TCT CTG GAT CAG TAT CAG ACG GAG GAT GAG CTG TAC CAG CTG 1393 
Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu Leu Tyr Gin Leu 
365 370 375 380 

TCC CTG CAG CGG GAG CCG CGC TCC AAG TCC TCG CCA ACC AGC CCC ACG 1441 
Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro Thr Ser Pro Thr 
385 390 395 

AGT TGC ACC CCA CCA CCC CGG CCC CCG GTA CTG GAG GAG TGG ACC TCG 1489 
Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu Glu Trp Thr Ser 
400 405 410 

GCT GCC AAA CCC AAG CTG GAT. CAG GCC CTC GTG GTG GAG CAC ATC GAG 1537 
Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val Glu His lie Glu 
415 420 425 

AAG ATG GTG GAG TCT GTG TTC CGG AAC TTT GAC GTC GAT GGG GAT GGC 1585 
Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val Asp Gly Asp Gly 
430 435 440 

CAC ATC TCA CAG GAA GAA TTC CAG ATC ATC CGT GGG AAC TTC CCT TAC 1633 
His lie Ser Gin Glu Glu Phe Gin lie lie Arg Gly Asn Phe Pro Tyr 
445 450 455 460 

CTC AGC GCC TTT GGG GAC CTC GAC CAG AAC CAG GAT GGC TGC ATC AGC 1681 
Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp Gly Cys lie Ser 
465 470 475 

AGG GAG GAG ATG GTT TCC TAT TTC CTG CGC TCC AGC TCT GTG TTG GGG 1729 
Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser Ser Val Leu Gly 
480 485 490 

GGG CGC ATG GGC TTC GTA CAC AAC TTC CAG GAG AGC AAC TCC TTG CGC 1777 
Gly Arg Met Gly Phe Val His. Asn Phe Gin Glu Ser Asn Ser Leu Arg 
495 500 505 

CCC GTC GCC TGC CGC CAC TGC AAA GCC CTG ATC CTG GGC ATC TAC AAG 1825 
Pro Val Ala Cys Arg His Cys Lys Ala Leu He Leu Gly He Tyr Lys 
510 515 520 

CAG GGC CTC AAA TGC CGA GCC TGT GGA GTG AAC TGC CAC AAG CAG TGC 1873 
Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys His Lys Gin Cys 
525 530 535 540 

AAG GAT CGC CTG TCA GTT GAG TGT CGG CGC AGG GCC CAG AGT GTG AGC 1921 
Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala Gin Ser Val Ser 
545 550 555 

CTG GAG GGG TCT GCA CCC TCA CCC TCA CCC ATG CAC AGC CAC CAT CAC 1969 
Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His Ser His His His 
560 565 570 

CGC GCC TTC AGC TTC TCT CTG CCC CGC CCT GGC AGG CGA GGC TCC AGG 2017 
Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg Arg Gly Ser Arg 
575 580 585 

CCT CCA GAG ATC CGT GAG GAG GAG GTA CAG ACG GTG GAG GAT GGG GTG 2065 
Pro Pro Glu He Arg Glu Glu Glu Val Gin Thr Val Glu Asp Gly Val 
590 595 600 

TTT GAC ATC CAC TTG TAATAGATGC TGTGGTTGGA TCAAGGACTC ATTCCTGCCT 2120 
Phe Asp He His Leu 
605 610 
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TGGAGAAAAT ACTTCAACCA GAGCAGGGAG CCTGGGGGTG TCGGGGCAGG AGGCTGGGGA 2180 
TGGGGGTGGG ATATGAGGGT GGCATGCAGC TGAGGGCAGG GCCAGGGCTG GTGTCCCTAA 2240 
GGTTGTACAG ACTCTTGTGA ATATTTGTAT TTTCCAGATG GAATAAAAAG GCCCGTGTAA 2300 
TTAACCTTC 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 609 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

Met Ala Gly Thr Leu Asp Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
15 10 15 

Leu Arg Gly Cys He Glu Ala Phe Asp Asp Ser Gly Lys Val Arg Asp 
20 25 30 

Pro Gin Leu Val Arg Met Phe Leu Met Met His Pro Trp Tyr He Pro 
35 40 45 

Ser Ser Gin Leu Ala Ala Lys Leu Leu His He Tyr Gin Gin Ser Arg 
50 55 60 

Lys Asp Asn Ser Asn Ser Leu Gin Val Lys Thr Cys His Leu Val Arg 
65 70 75 80 

Tyr Trp He Ser Ala Phe Pro Ala Glu Phe Asp Leu Asn Pro Glu Leu 
85 90 95 

Ala Glu Gin He Lys Glu Leu Lys Ala Leu Leu Asp Gin Glu Gly Asn 
100 105 HO 

Arg Arg His Ser Ser Leu He Asp He Asp Ser Val Pro Thr Tyr Lys 
115 120 125 

Trp Lys Arg Gin Val Thr Gin Arg Asn Pro Val Gly Gin Lys Lys Arg 
130 135 140 

Lys Met Ser Leu Leu Phe Asp His Leu Glu Pro Met Glu Leu Ala Glu 
145 150 155 160 

His Leu Thr Tyr Leu Glu Tyr Arg Ser Phe Cys Lys He Leu Phe Gin 
165 170 175 

Asp Tyr His Ser Phe Val Thr His Gly Cys Thr Val Asp Asn Pro Val 
180 185 190 

Leu Glu Arg Phe He Ser Leu Phe Asn Ser Val Ser Gin Trp Val Gin 
195 200 205 

Leu Met He Leu Ser Lys Pro Thr Ala Pro Gin Arg Ala Leu Val He 
210 215 220 

Thr His Phe Val His Val Ala Glu Lys Leu Leu Gin Leu Gin Asn Phe 
225 230 235 240 

Asn Thr Leu Met Ala Val Val Gly Gly Leu Ser His Ser Ser He Ser 
245 250 255 
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Arg Leu Lys Glu Thr His Ser His Val Ser Pro Glu Thr lie Lys Leu 
260 265 270 

Trp Glu Gly Leu Thr Glu Leu Val Thr Ala Thr Gly Asn Tyr Gly Asn 
275 280 285 

Tyr Arg Arg Arg Leu Ala Ala Cys Val Gly Phe Arg Phe Pro lie Leu 
290 295 300 

Gly Val His Leu Lys Asp Leu Val Ala Leu Gin Leu Ala Leu Pro Asp 
305 310 315 320 

Trp Leu Asp Pro Ala Arg Thr Arg Leu Asn Gly Ala Lys Met Lys Gin 
325 330 335 

Leu Phe Ser lie Leu Glu Glu Leu Ala Met Val Thr Ser Leu Arg Pro 
340 345 350 

Pro Val Gin Ala Asn Pro Asp Leu Leu Ser Leu Leu Thr Val Ser Leu 
355 360 365 

Asp Gin Tyr Gin Thr Glu Asp Glu Leu Tyr Gin Leu Ser Leu Gin Arg 
370 375 380 

Glu Pro Arg Ser Lys Ser Ser Pro Thr Ser Pro Thr Ser Cys Thr Pro 
385 390 395 400 

Pro Pro Arg Pro Pro Val Leu Glu Glu Trp Thr Ser Ala Ala Lys Pro 
405 410 415 

Lys Leu Asp Gin Ala Leu Val Val Glu His He Glu Lys Met Val Glu 
420 425 430 

Ser Val Phe Arg Asn Phe Asp Val Asp Gly Asp Gly His He Ser Gin 
435 440 445 

Glu Glu Phe Gin He He Arg Gly Asn Phe Pro Tyr Leu Ser Ala Phe 
450 455 460 

Gly Asp Leu Asp Gin Asn Gin Asp Gly Cys He Ser Arg Glu Glu Met 
465 470 475 480 

Val Ser Tyr Phe Leu Arg Ser Ser Ser Val Leu Gly Gly Arg Met Gly 
485 490 495 

Phe Val His Asn Phe Gin Glu Ser Asn Ser Leu Arg Pro Val Ala Cys 
500 505 510 

Arg His Cys Lys Ala Leu He Leu Gly He Tyr Lys Gin Gly Leu Lys 
515 520 525 

Cys Arg Ala Cys Gly Val Asn Cys His Lys Gin Cys Lys Asp Arg Leu 
530 535 540 

Ser Val Glu Cys Arg Arg Arg Ala Gin Ser Val Ser Leu Glu Gly Ser 
545 550 555 560 

Ala Pro Ser Pro Ser Pro Met His Ser His His His Arg Ala Phe Ser 
565 570 575 

Phe Ser Leu Pro Arg Pro Gly Arg Arg Gly Ser Arg Pro Pro Glu He 
580 585 590 

Arg Glu Glu Glu Val Gin Thr Val Glu Asp Gly Val Phe Asp He His 
595 600 605 

Leu 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 832 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 11. .733 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GCCCGCCGCC ATG CCG CCC TTA CTG CCC CTG CGC CTG TGC CGG CTG TGG 49 
Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp 
1 5 10 

CCC CGC AAC CCT CCC TCC CGG CTC CTC GGA GCG GCC GCC GGG CAG CGG 97 
Pro Arg Asn Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg 
15 20 25 

TCC AGA CCC AGT ACT TAT TAT GAA CTG TTG GGG GTG CAT CCT GGT GCC 145 
Ser Arg Pro Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala 
30 35 40 45 

AGC ACT GAG GAA GTT AAA CGA GCT TTC TTC TCC AAG TCC AAA GAG CTG 193 
Ser Thr Glu Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu 
50 55 60 

CAC CCA GAC CGG GAC CCT GGG AAC CCA AGC CTG CAC AGC CGC TTT GTG 241 
His Pro Asp Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val 
65 70 75 

GAG CTG AGC GAG GCA TAC CGT GTG CTC AGC CGT GAG CAG AGC CGC CGC 289 
Glu Leu Ser Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg 
80 85 90 

AGC TAT GAT GAC CAG CTC CGC TCA GGT AGT CCC CCA AAG TCT CCA CGA 337 
Ser Tyr Asp Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg 
95 100 105 

ACC ACA GTC CAT GAC AAG TCT GCC CAC CAA ACA CAC AGC TCC TGG ACA 385 
Thr Thr Val His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr 
110 115 120 125 

CCC CCC AAC GCA CAG TAC TGG TCC CAG TTT CAC AGC GTG AGG CCA CAG 433 
Pro Pro Asn Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin 
130 135 140 

GGG CCC CAG TTG AGG CAG CAG CAA CAC AAA CAA AAC AAA CAA GTG CTG 481 
Gly Pro Gin Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu 
145 150 155 

GGG TAC TGC CTC CTC CTC ATG CTG GCG GGC ATG GGC CTG CAC TAC ATT 52 9 

Gly Tyr Cys Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr He 
160 165 170 

GCC TTC AGG AAG GTG AAG CAG ATG CAC CTT AAC TTC ATG GAT GAA AAG 577 
Ala Phe Arg Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys 
175 180 185 
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GAT CGG ATC ATC AC A GCC TTC TAC AAC GAA GCC CGG GCA CGG GCC AGG 625 
Asp Arg He He Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg 
190 . 195 200 205 

GCC AAC AGA GGC ATC CTT CAG CAG GAG CGA CAA CGG CTA GGG CAG CGG 673 
Ala Asn Arg Gly He Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg 
210 215 220 

CAG CCG CCA CCA TCC GAG CCA ACC CAA GGC CCC GAG ATC GTG CCC CGG 721 
Gin Pro Pro Pro Ser Glu Pro Thr Gin Gly Pro Glu He Val Pro Arg 
225 230 235 

GGC GCC GGC CCC TGA GGGGCTC ACCTGGATGG GGCCTGCAGT GCGTTCCCGC 773 
Gly Ala Gly Pro * 
240 

TTTGCTTCCT TCCCTGGACG GCCCGCTCCC CGAAACGCGC GCAATAAAGT GATTCGCAG 832 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 241 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein . 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp Pro Arg Asn 
1 5 10 15 

Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg Ser Arg Pro 
20 25 30 

Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala Ser Thr Glu 
35 40 45 

Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu His Pro Asp 
50 55 60 

Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val Glu Leu Ser 
65 70 75 80 

Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg Ser Tyr Asp 
85 90 95 

Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg Thr Thr Val 
100 105 110 

His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr Pro Pro Asn 
115 120 125 

Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin Gly Pro Gin 
130 135 . 140 

Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu Gly Tyr Cys 
145 150 155 160 

Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr He Ala Phe Arg 
165 170 175 

Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys Asp Arg He 
180 185 190 

He Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg Ala Asn Arg 
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195 200 205 



Gly lie Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg Gin Pro Pro 
210 215 220 

Pro Ser Glu Pro Thr Gin Gly Pro Glu He Val Pro Arg Gly Ala Gly 
225 230 235 240 

Pro 

SEQ ID Nos: 10-18 25-36 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 300 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 170.. 300 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 120 

TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCGGGTTAG CCC CAT 175 

Pro His 
1 

GGG AAC GGG GTT CGG TCC GAG CCC GGT GGG AGG CTC CCG GAG CGC AGC 223 
Gly Asn Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser 
5 10 15 

CTG GGC CCA GCC CAC CCC GCG CCG GCG GCC ATG GCA GGC ACC CTG GAC 271 
Leu Gly Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp 
20 25 30 

CTG GAC AAG GGC TGC ACG GTG GAG GAG CT 300 
Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
35 40 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE 



DESCRIPTION: SEQ ID NO : 8 : 



Pro His Gly Asn 
1 



Gly 
5 



Val Arg Ser Glu 



Pro Gly Gly Arg Leu Pro Glu 
10 15 



Arg Ser Leu Gly 
20 



Pro 



Ala His Pro Ala 
25 



Pro Ala Ala Met Ala Gly Thr 
30 



Leu Asp Leu Asp 
35 



Lys 



Gly Cys Thr Val 
40 



Glu Glu Leu 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
GGGATCCCCC TGGTC 15 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Asp Val Asp Glu Glu Asp Glu Val Glu Asp lie Glu Phe 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Asp Val Asp Gly Asp Gly His lie Ser Gin Glu Glu Phe 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Asp His Asp Arg Asp Gly Phe lie Ser Gin Glu Glu Phe 
15 10 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Asp Gin Asn Gin Asp Gly Cys lie Ser Arg Glu Glu Met 
15 10 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Asp Val Asp Met Asp Gly Gin lie Ser Lys Asp Glu Leu 
15 10 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

His Phe Val His Val Ala Glu Lys Leu Leu Gin Leu Gin Asn Phe Asn 
15 10 15 

Thr Leu Met Ala Val Val Gly Gly Leu Ser His Ser Ser He Ser Arg 
20 25 30 
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Leu Lys Glu Thr His 
35 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Lys Phe Val His Val Ala Lys His Leu Arg Lys lie Asn Asn Phe Asn 
15 10 15 

Thr Leu Met Ser Val Val Gly Gly lie Thr His Ser Ser Val Ala Arg 
20 25 30 

Leu Ala Lys Thr Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

His Asn Phe Gin Glu Ser Asn Ser Leu Arg Pro Val Ala Cys Arg His 
1 5 10 15 

Cys Lys Ala Leu lie Leu Gly lie Tyr Lys Gin Gly Leu Lys Cys Arg 
20 25 30 

Ala Cys Gly Val Asn Cys His Lys Gin Cys Lys Asp Arg Leu Ser Val 
35 40 45 

Glu Cys 
50 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

His Asn Phe His Glu Thr Thr Phe Leu Thr Pro Thr Thr Cys Asn His 
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15 10 15 

Cys Asn Lys Leu Leu Trp Gly He Leu Arg Gin Gly Phe Lys Cys Lys 
20 25 30 

Asp Cys Gly Leu Ala Val His Ser Cys Cys Lys Ser Asn Ala Val Ala 
35 40 45 

Glu Cys 
50 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GGGATCCCCC TGGTC 15 
(2) INFORMATION FOR SEQ ID NO: 20: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GAATTCGGCA CGAGCCGACG G 21 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 78 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
ATGGAGCAGA AGCTGATCTC CGAGGAGGAC CTGCCCGGGG CAGCTGGATC CGCAGCCCAC 60 
CCCGCGCCGG CGGCCATG 78 
(2) INFORMATION FOR SEQ ID NO: 22: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO:22: 

Met Glu Gin Lys Leu He. Ser Glu Glu Asp Leu Pro Gly Ala Ala Gly 
1 5 10 15 

Ser Ala Ala His Pro Ala Pro Ala Ala Met 
20 25 



( 2 ) INFORMATION FOR SEQ ID NO : 2 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GGATCCGCAG CCCACCCCGC GCCGGCGGCC ATG 33 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Gly Ser Ala Ala His Pro Ala Pro Ala Ala Met 
5 10 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
GGACAAAGTG TGTGATGAAC C 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CTCATCCTCC GTCTGATACT G 21 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GTAGATGTGG ATCAGCTTGG 20 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
AGGTGGAGAA TGGTCAAGG 19 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GTCATAGTCT GTCTCCTACT 



20 
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(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
ACATAGACAG CGTGCCTACC 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
TACAACCTTA GGGACACCAG 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
TGCTGAGCCT GCTCACGGTG 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CAAGTGAACA GCACGTCC 
(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
GACTATCTCA AGGACCAGCT G 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
GGTTCGGTCC GAGCCCGG 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
GGAGCGATAC TCCAAGTAGG T 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
AGCGGGCCAG GCCCCTTC 



(2) INFORMATION FOR SEQ ID NO: 38: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



• (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CATCCTGGTC CAATGCGCTC 2 0 



(2) INFORMATION FOR SEQ ID NO: 39: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA . 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 
GCACTGAGGA AGTTAAACGA GC 22 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
* GCTCGTTTAA CTTCCTCAGT GC 22 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 



GCTCAGCTCC ACAAAGCGGC T 
(2) INFORMATION FOR SEQ ID NO: 42: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 



ACCAGCTCCG CTCAGGTAG 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 



TCCAGGAGCT GTGTGTTTGG 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 



CCAGTTTCAC AGCGTGAGG 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 



CAGCATGAGG AGGAGGCAG 
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CLAIMS: 

1. An isolated nucleic acid molecule comprising a sequence of nucleotides encoding or 
complementary to a sequence encoding an amino acid sequence having homology to a regulator 
of gene expression or a derivative of said gene regulator. 

2. An isolated nucleic acid molecule according to claim 1 wherein the regulator 
comprises a zinc finger domain of an (HC 3 ) 2 type. 

3. An isolated nucleic acid molecule according to claim 2 wherein the sequence of 
nucleotides or complementary sequence of nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

4. An isolated nucleic acid molecule according to claim 1 wherein said gene regulator is 
a guanine nucleotide exchange factor (GEF) or a derivative thereof. 

5. An isolated nucleic acid molecule according to claim 4 wherein the sequence of 
nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 or 
7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
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nucleotide sequence set forth in (i), (ii) or (iii). 

6. An isolated nucleic acid molecule according to claim 1, wherein said gene regulator 
is a heat shock protein or is a heat shock binding protein or a derivative thereof. 

7. An isolated nucleic acid molecule according to claim 6, wherein the sequence of 
nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

8. A genetic construct comprising a vector portion and a gene portion comprising a 
regulator of gene expression or a derivative thereof . 

9. A genetic construct according to claim 8 wherein the gene portion comprises a zinc 
finger domain of (HC 3 ) 2 type. 

10. A genetic construct according to claim 9 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 
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11. A genetic construct according to claim 8 wherein said gene portion is a nucleotide 
exchange factor (GEF) or derivative thereof. 

12. A genetic construct according to claim 1 1 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ED NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO: 5 or 
7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

13. A genetic construct according to claim 8 wherein the gene portion is a heat shock 
protein or a derivative thereof or a heat shock binding protein or derivative thereof. 

14. A genetic construct according to claim 13 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 



15. A nucleic acid molecule encoding a gene regulator having the identifying 
characteristics of a molecule selected from MCG4, MCG7 and MCG18 having respective amino 
acid sequences of SEQ ID NO:3, SEQ ID NO: 5 or 7 and SEQ ID NO:9. 
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16. A method of detecting a condition caused or facilitated by an aberration in mcg4, said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcg4 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

17. A method of detecting a condition caused or facilitated by an aberration in mcg4, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG4 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

18. A method for detecting MCG4 or a derivative thereof in a biological sample said 
irethod comprising contacting said biological sample with an antibody specific for MCG4 or its 
derivatives or homologues for a time and under conditions sufficient for an antibody-MCG4 
complex to form, and then detecting said complex. 

19. A method of detecting a condition caused or facilitated by an aberration in mcg7 f said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said meg 7 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

20. A method of detecting a condition caused or facilitated by an aberration in mcg7, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG7 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

21. A method for detecting MCG7 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG7 or its 
derivatives or homologues for a time and under conditions sufficient for an antibody-MCG7 
complex to form, and then detecting said complex. 
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22. A method of detecting a condition caused or facilitated by an aberration in meg 18, said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcgl8 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

23. A method of detecting a condition caused or facilitated by an aberration in meg 18, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG18 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

24. A method for detecting MCG18 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG18 or 
its derivatives or homologues for a time and under conditions sufficient for an antibody-MCG18 
complex to form, and then detecting said complex. 
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FIG III) 



FIG 1 (II) 



FIG 1 0il) 



FIG 1 (IV) 



FIG 1 (V) 
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FIGURE 18 (Cont. I) 
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Plasmid name: clone 16 in pGEX-3X 
Plasmid size:6.00 kb 

FIGURE 18 (Cont. II) 

EcoRI 0.00 




I 1.00 



Plasmid name: clone 19 in pGEX-1 
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FIGURE 18 (Cont. HI) 
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